The Legislative Agenda in Argentina. A Latent Dirichlet Allocation

Anuncio
The Legislative Agenda in Argentina. A Latent Dirichlet
Allocation Application.
Constanza F. Schibber
Washington University in St. Louis
This version: November 4, 2014
Classification of Bills. A legislator’s agenda is composed of the legislation she introduces. A
number of studies rely on the bills introduced by legislators to measure their policy agenda. TaylorRobinson and David (2002) and Crisp et al. (2004) classify legislation introduced according to its
geographic focus to show that personal vote-seeking incentives are more likely to introduce bills
focusing on the electoral district rather than engage in national policies. To do this they rely
on detailed coding rules and classify initiatives by looking at the title of the bills. (Gamm and
Kousser, 2010), focusing on the legislation introduced in U.S. subnational legislatures, also classify
the legislation according to whether it focuses on a broad issue or on particularistic issues. They
achieve this by considering bills tailored to the district of the legislator as particularistic and policies
that are not tailored to address the district’s concerns as broad. Other studies have only focused
on particularistic legislation. For instance, Chasquetti and Micozzi (2014) and Micozzi (2013)
classify bills sponsored by legislators in Uruguay and Argentina as targeting a district when the
title mentions the name of the subnational unit the sponsor belongs to. They proceed to create
a measure of how many targeted-bills a legislator sponsors during their term in office without
considering the rest of the legislation introduced by the legislator.
The Data. I rely on the text of the legislation introduced by legislators. More specifically, the
text in which a sponsor explicitly says why she is presenting the bill. This text must be attached
to each bill introduced in Congress and it allows legislators to signal to their supporters the topic
of the proposal. As a result, by using this specific text we can tap into legislators’ motivations
when drafting the proposal. I collected the text of all legislative initiatives introduced between
2000 and 2011 from the website for the lower chamber of the Congress of Argentina, along with
individual level information about the primary sponsor – name, political party, electoral district,
mandate. The total number of bills is 12,073 and, on average, each legislator introduced 21 bills.
All legislators that served during this time period introduced at least one bill.
The Model. In Latent Dirichlet Allocation (LDA), each document is viewed as a mixture of several
topics Blei, Ng and Jordan (2003). Hence, the advantage of LDA is that it is not necessary to know
a priori the topics/categories present in the documents.1 For instance, a public health bill might
have words drawn from a topic related to seasons such as winter and words drawn from a topic
related to illnesses, such as flu. This core assumption is that words w in documents are generated
by a mixture of latent topics z, thus, the probability of observing a word depends on the probability
that the word given the topic and the probability of the topic. Formally,
X
P (w) =
P (w|z)P (z)
z
Furthermore, LDA follows the bag of words model, hence, making no assumptions about the
order of the words in the documents.
An LDA model consists of three levels: corpus, document, word. Following the notation from
Blei, Ng and Jordan (2003), who initially proposed the LDA model, let’s assume K topics and each
topic βk is drawn from a Dirichlet distribution. A draw from a k dimensional Dirichlet returns a k
dimensional multinomial, θd in this case, where the k values must sum to one. Given this, the LDA
process, for each document in a collection of document, is as follows,
1. Draw a topic distribution θd ∼ Dir(α), where Dir(·) is a draw from a uniform Dirichlet
distribution with scaling parameter α.
2. For each word in the document:
(a) Draw a specific topic zd,n ∼ M ulti(θd ) where M ulti(·) is a multinomial
(b) Draw a word wd,n ∼ βzd,n
The parameters α and β are are corpus level parameters. Then, θd,n is a document level variables sampled once per document. Last, zd,n and wd,n are word-level variables and are sampled
once for each word in each document. Because the topic distribution is sampled repeatedly within
a document, it allows documents to be associated with multiple topics rather than just one. The
posterior distribution from this model is intractable so it cannot be computed directly; thus, I use a
Variational Bayes (VB) optimization that has been shown to be as reliable as the partially collapsed
Gibbs sampler but faster (Blei, Ng and Jordan, 2003).
1
In Political Science, (Quinn et al., 2010) and (Grimmer, 2010) have used variations of this model to study debates
in congress and legislators’ press releases, for instance.
1
50 Topics. Remember that each documents is viewed as a mixture of several topics. In other
words, given the 50 topics uncovered by the model, a document has a probability of containing
each of these topics. The following two tables presenting the key words by topic. The first table
includes the top words translated into English and the second table presents a original table in
Spanish. I assigned the label for the topic based on top words and I also drew random samples of
bills for which the probability that they included the topic was highest and read the text. I classify
8 topics as referring to credit-claiming activities – highlighted in blue in the tables - as they are
particularism bills . These refer to the topics labeled: Public Works, Local Resources, Emergency
Resources, Tourism, Provinces, Local-Historical Issues, Patagonia, Customs & Tariffs.
Table 1: Top Words by Topic (in English) and Labels. Part 1.
Top Words
route works, road, roads, jumps, transit
construction, roads, province, construction
executive, provinces, resources, domain
subsection, and constitutional problems, tips
budget, emergency, finance, executive resources
approval, maria, background, exposed
tourism, student, runner, Bariloche, tourist
guide, activity, montana, activities
provinces, province, region, state, resources
federal, provincial, san, santa, population
city, county, san, aires, history, culture
historical, Argentina, village
province, southern islands, antarctica, fire
Antarctic, land, Atlantic, Ushuaia, Patagonia
insurance, customs, contract, insurance, insurance
customs, duties, insured, civil
Label
Public Works
Local Resources
Emergency Resources
Tourism
Provincial Issues
Local Issues
Patagonian Issues
Customs & Tariffs
2
Table 2: Top Words by Topic (in English) and Labels. Part 2.
Top Words
data, health, information, medical, organs, medicine,
patient, donation, medical, medical
workers, regimen, contributions, benefits, pensions,
allowances, benefits, age, pension, beneficiaries
weapons, disability, internet, fire, gun, access,
disabled, states, approval, holding
consumer, consumers, medicine, defense, consumer,
users, health products, user information
accidents, traffic, car, vehicle, alcohol,
life, transport cases, drivers, identity
politics, countries, world, life, water, fisheries,
population, people, human, resources
federal, control, city air, government, public,
information, management, public, functions
snuff, smoking, cigarettes, smokers, mining,
consumer countries, advertising, tobacco, state
agreement, international, parts, measurements, organization,
countries, employment provisions, application, information
criminal offenses, crime, violence, juvenile code,
children, victims, freedom, type
justice, judicial, process, judge, case, judgment,
principle, procedure, rule, action
women, woman, church, men, life, men,
christ, gender, equality
civil, code, time, responsibility, application,
property agreement, property, cases, activities
flag, air, blue, colors, Belgrano,
provinces, white, blue, white
debt, dollars, million, debt, government,
foreign coins, financial, investment
executive, congress, need, decrees, camera, laws,
legislation, sanction, subsection, emergency
workers, worker, labor, contract, employer, employment
hours, relation, occupation, conditions
tax funds, fiscal, financial, expenses, taxes,
profits, income, resources, budget
human, convention, family, child, protection
discrimination, parents, wife, international
health, life, prevention, case, disease, population,
public, risk, action, diseases
education, society, political, social, programs, resources,
young, life, needs, training
diversity, parts, biological, resource arbitration,
contracting, variety, shelter, breeder, arbitration
3
Label
Public Health
Social Security
Disabilities
Health Insurance
Traffic and Driving
Fishing and Water Resources
Bureaucracy
Smoking
Labor Legislation
Crime
Judicial Procedures
Christian Values
Civil Law
Patriotism
Public Debt
Legislative Process
Labor Conditions
Taxes
Children & Family
Illnesses
Education
Arbitration
Table 3: Programmatic Bills. Top Words by Topic (in English) and Labels. Part 3.
Top Words
transport, air, activity, aerial, aircraft, civil,
aviation, passenger, flight, aviation
women, marriage, marital, discrimination, equality, union,
children, spouses, family, code
sexual, energy, harassment, conduct
harassment, victim, code, occupation, cases
societies, society, business, sport, soccer
entities, associations, broadcasting, end, tv
parties, politicians, political, electoral, political, democracy,
representation, candidates, participation, advice
cooperatives, cooperative, music, theater, culture, property,
business, identity, associated, owned
industry, production, market, business, activity, gas
products, economy, industry, countries
waste, environmental, environment, treatment, disposal
hazardous, minimum, final, budgets, authority
media freedom, communication, information, press, expression
environment, response, diffusion, activity
address, necessary, consortia, consortium,
horizontal, property evaluation, administrator
court, justice, supreme court, law, constitutional,
human, legal, federal, and international
food products, production, producers, agriculture, food
quality, livestock, market, tons
college, education, universities, higher level, studies,
training, population, students, professional
bank, credit, payment, housing, credit, institutions
debts, financial finance obligations
energy, environment, natural, production, environmental,
environment, species, water areas
corruption, criminal, legal, society, justice,
transparency, legal, institutions, events, economic
wood, steel, beverages, steel, engine, hp,
alcohol, drawers, alcoholic, table
federal, court, justice, court, creation, customer,
causes, competition, federal defender
users, companies, prices, public, grant, contract,
contracts, construction, prices, company
turismo, paises, comercio, actividad,
internacional, exterior, sector, millones, economia
4
Label
Aviation
Marriage & Civil Union
Sexual Harassment
Sports
Elections
Cooperatives
Industry and Business
Environment
Freedom of Expression
Property
Judicial System 1
Agriculture & Livestock
Higher Education
Banking System
Energy
Transparency
Alcoholic Beverages
Judicial System 2
Public Utilities
International Economics
Table 4: Top Words by Topic (in Spanish) and Labels
Top Words
datos , salud , informacion , medico , organos , medicina , paciente , donacion , medica , medicos
trabajadores , regimen , aportes , beneficio , previsional , prestaciones , beneficios , edad , pensiones , beneficiarios
armas , discapacidad , internet , fuego , arma , acceso , discapacitados , estados , aprobacion , tenencia
consumidor , consumidores , medicamentos , defensa , consumo , usuarios , salud , productos , usuario , informacion
accidentes , transito , vehiculos , vehiculo , alcohol , vida , transportes , casos , conductores , identidad
politica , paises , mundo , vida , agua , pesca , poblacion , pueblos , humano , recursos
federal , control , ciudad , aires , gobierno , publica , informacion , administracion , publico , funciones
tabaco , fumar , cigarrillos , fumadores , minera , consumo , paises , publicidad , tabaquismo , estados
convenio , internacional , partes , medidas , organizacion , paises , empleo , disposiciones , aplicacion , informacion
penal , delitos , delito , violencia , menores , codigo , ninos , victimas , libertad , tipo
justicia , judicial , proceso , juez , casos , juicio , principio , procedimiento , norma , accion
mujeres , mujer , iglesia , hombres , vida , varones , cristo , genero , igualdad
civil , codigo , tiempo , responsabilidad , aplicacion , propiedad , acuerdo , bienes , casos , actividad
bandera , aires , azul , colores , belgrano , provincias , blanco , celeste , blanca
deuda , dolares , millones , deudas , gobierno , externa , pesos , financiero , inversiones
ruta , obras , vial , rutas , salta , transito , obra , caminos , provincia , construccion
ejecutivo , congreso , necesidad , decretos , camara , leyes , legislativo , sancion , inciso , urgencia
trabajadores , trabajador , laboral , contrato , empleador , laborales , horas , relacion , empleo , condiciones
impuesto , fondos , fiscal , financiamiento , gastos , impuestos , ganancias , ingresos , recursos , presupuesto
humanos , convencion , familia , nino , proteccion , discriminacion , padres , mujer , internacional , ninos
salud , vida , prevencion , casos , enfermedad , poblacion , publica , riesgo , accion , enfermedades
educacion , sociedad , politicas , sociales , programas , recursos , jovenes , vida , necesidades , formacion
diversidad , partes , biologica , recursos , arbitral , contratante , variedad , vivienda , obtentor , arbitraje
transporte , aereo , actividad , aerea , aeronaves , civil , aviacion , pasajeros , vuelo , aeronautica
mujer , matrimonio , civil , discriminacion , igualdad , union , hijos , conyuges , familia , codigo
ejecutivo , provincias , recursos , dominio , inciso , materia , constitucional , consejo
presupuesto , emergencia , hacienda , ejecutivo , recursos , aprobacion , maria , fondo , expuesto , alfredo
sexual , energia , acoso , conducta , hostigamiento , victima , codigo , empleo , casos
sociedades , sociedad , actividad , deporte , futbol , entidades , asociaciones , radiodifusion , fines , television
partidos , politicos , politica , electoral , politico , democracia , representacion , candidatos , participacion , consejo
cooperativas , cooperativa , musica , teatro , cultura , propiedad , empresa , identidad , asociados , participada
sector , produccion , mercado , empresas , actividad , gas , productos , economia , industria , paises
turismo , estudiantil , corredor , bariloche , turistico , guias , actividad , montana , actividades
residuos , ambiental , ambiente , tratamiento , disposicion , peligrosos , minimos , final , presupuestos , autoridad
provincias , provincia , region , provinciales , recursos , federal , provincial , san , santa , poblacion
medios , libertad , comunicacion , informacion , prensa , expresion , medio , respuesta , difusion , actividad
domicilio , necesario , consorcios , consorcio , horizontal , propiedad , evaluacion , administrador
corte , justicia , suprema , tribunal , leyes , constitucional , humanos , judicial , federal , internacional
ciudad , provincia , san , aires , historia , juan , cultura , historico , argentino , pueblo
provincia , sur , islas , antartida , fuego , antartico , tierra , atlantico , ushuaia , provincial
alimentos , productos , produccion , productores , agricultura , alimentaria , calidad , ganaderia , mercado , toneladas
universidad , educacion , universidades , nivel , superior , estudios , formacion , poblacion , estudiantes , profesional
banco , credito , pago , vivienda , creditos , entidades , deudas , financiero , finanzas , obligaciones
seguros , aduanero , contrato , seguro , asegurador , aduana , tasas , asegurado , civil
energia , ambiente , recursos , naturales , produccion , ambiental , medio , especies , agua , areas
corrupcion , penal , juridica , sociedad , justicia , transparencia , juridicas , instituciones , actos , economico
madera , acero , bebidas , inoxidable , motor , hp , alcohol , cajones , alcoholicas , mesa
federal , juzgado , justicia , juzgados , creacion , cliente , causas , competencia , federales , defensor
usuarios , empresas , tarifas , publicos , concesion , contratos , contrato , obras , precios , empresa
5
Label
Public Health
Social Security
Disabilities
Health Insurance
Traffic and Driving
Fishing and Water Resources
Bureaucracy
Smoking
Labor Legislation
Crime
Judicial Procedures
Christian Values
Civil Law
Patriotism
Public Debt
Public Work
Legislative Process
Labor Conditions
Taxes
Family and Children
Illnesses
Education
Housing
Aeronautical Transportation
Family
Local Resources
Emergency Resources
Sexual Harassment
Sports
Elections
Cooperatives
Industry and Business
Tourism
Environment
Provinces
Freedom of Expression
Property
Judicial System 1
Local - historical issues
Patagonia
Agriculture & Livestock
Higher Education
Banking System
Customs and Tariffs
Energy
Transparency
Alcohol
Judicial System 2
Public Utilities
One Topic by Bill. In order to assign one topic to each bill, I search, for each initiative, for the
highest probability of belonging to a topic and assign that topic to the bill.2 Figure 1 shows the
frequency distribution of topics. The most common topic is ‘Economic Activity’ (7.2% of the bills),
which includes initiatives referring to the production of goods, companies, and industries. Other
frequent topics are ‘Education’ (5.8%) and ‘Judicial Procedures’ (5.6%). The least frequent topics
are ‘Sexual Harassment’ (0.17%) and ‘Transparency’ (0.23%).
300
200
0
100
Frequency
400
500
Figure 1: Frequency of Topics for 16,644 initiatives in the Lower Chamber of Argentina.
0
10
20
30
40
50
Topics
2
I also implemented a hierarchical clustering algorithm and a K-Means algorithms, but both drew unsatisfactory
results. Comparing the topic assigned to the bill and the title of the bill resulted in many inconsistencies.
6
References
Blei, David M, Andrew Y Ng and Michael I Jordan. 2003. “Latent dirichlet allocation.” the Journal
of machine Learning research 3:993–1022.
Carey, J.M. 2007. “Competing principals, political institutions, and party unity in legislative voting.” American Journal of Political Science pp. 92–107.
Chasquetti, Daniel and Juan Pablo Micozzi. 2014. “The Subnational Connection in Unitary
Regimes: Progressive Ambition and Legislative Behavior in Uruguay.” Legislative Studies Quarterly 39(1):87–112.
Crisp, Brian F., Maria C. Escobar-Lemmon, Bradford S. Jones, Mark P. Jones and Michelle M.
Taylor-Robinson. 2004. “Vote-Seeking Incentives and Legislative Representation in Six Presidential Democracies.” The Journal of Politics 66(03):823–846.
Gamm, Gerald and Thad Kousser. 2010. “Broad bills or particularistic policy? Historical patterns
in American state legislatures.” American Political Science Review 104(01):151–170.
Grimmer, Justin. 2010. “A Bayesian hierarchical topic model for political texts: Measuring expressed agendas in Senate press releases.” Political Analysis 18(1):1–35.
King, Gary, Michael Tomz and Jason Wittenberg. 2000. “Making the most of statistical analyses:
Improving interpretation and presentation.” American journal of political science pp. 347–361.
Levendusky, Matthew S and Jeremy C Pope. 2010. “Measuring Aggregate-Level Ideological Heterogeneity.” Legislative Studies Quarterly 35(2):259–282.
Micozzi, Juan Pablo. 2013. “Does Electoral Accountability Make a Difference? Direct Elections,
Career Ambition, and Legislative Performance in the Argentine Senate.” The Journal of Politics
75(01):137–149.
Quinn, Kevin M, Burt L Monroe, Michael Colaresi, Michael H Crespin and Dragomir R Radev. 2010.
“How to analyze political attention with minimal assumptions and costs.” American Journal of
Political Science 54(1):209–228.
Taylor-Robinson, M.M. and SJ David. 2002. “Who Participates and Who Is Seen and Not Heard?
Evidence From the Honduran Congress.” Journal of Legislative Studies 8(1):10–36.
Tsai, Tsunghan and Jeff Gill. 2013. “Interactions in Generalized Linear Models: Theoretical Issues
and and an Application to Personal Vote-Earning Attributes.” Social Science 2:91–113.
7
Descargar