Sometimes computer folks simply see the world a little differently than everyone else. NewScientist (14 November 2015, paywall) reports on the latest attempt to do something serious in the world of politics with Big Data:
There is a long history of companies and other vested interests influencing legislation by lobbying politicians. So researchers at the University of Chicago’s Data Science for Social Good programme have created the Legislative Influence Detector. This scours the text of US bills, searching for passages that have been cribbed from lobbyists or the legislatures of other states.
“Our hope is that the public can use this to keep the government accountable,” says team member Julian Katz-Samuels, now a graduate student at the University of Michigan.
To get the real story behind a bill, the software digs through 500,000 state bills, as well as thousands of pieces of text drafted by lobbyist groups that were saved into a database. An algorithm then calculates the top 100 documents most relevant to the bill in question before examining each one more closely, searching for passages the two have in common.
What this reveals can be telling, says Katz-Samuels. The software can turn up lines of text originally written by activists and special interest groups. Or it might find that the bill borrows largely from laws already in place elsewhere, giving concerned citizens the chance to explore how the policy worked out there.
The Legislative Influence Detector project is located here. They explain:
To solve this problem, we have created a tool we call the “Legislative Influence Detector” (LID, for short). LID helps watchdogs turn a mountain of text into digestible insights about the origin and diffusion of policy ideas and the real influence of various lobbying organizations. LID draws on more than 500,000 state bills (collected by the Sunlight Foundation) and 2,400 pieces of model legislation written by lobbyists (collected by us, ALEC Exposed, and other groups), searches for similarities, and flags them for review. LID users can then investigate the matches to look for possible lobbyist and special interest influence.
The screenshot below shows LID at work. On the left-hand side is text from Wisconsin Senate Bill 179 (2015), which bans most abortions past the 19th week of pregnancy. On the right-hand side, LID found and presented SB 179’s highest-ranked match, Louisiana Senate Bill 593 (2012). The highlighting shows that these text sections match each other almost perfectly. Where differences exist, they are usually misspellings like “neurodeveolopmental” or formatting differences like “16”/“sixteen”.
LID finds legislative influence more quickly and easily than other tools. Reading bills manually takes too long. Google helps, but users can only search for short strings, not complete documents, and they must weed through many non-legislative results to find good matches. Inspired by Wilkerson, Smith, and Stramp (2015) and Hertel-Fernandez and Kashin, LID takes seconds to use, searches the entire document for matches, and returns only state bills and model legislation in the results.
Will this have an appreciable effect? Hard to say. I’m not feeling the love, but I’m self-aware enough to know that sometimes my skepticism is ill-worn. So if we think about this, this will be very dependent on the quality of the data it is working from: while the text of most or all laws (proposed and passed) are available online from known sources, other sources, such as ALEC, will not be as freely available – and those will be the important ones if you want to find the “silent influencers”. I’d say, in fact, that finding similar laws will be in everyone’s interests, while those from corporate, foreign, and other sources will not want to be known as they try to manipulate laws to their personal benefit. The ability of LID to determine a collection of disparate bills might have a common source, and then begin deducing what that source might be, could be an interesting future phase for the project.
They do complain about performance problems of their software. I wonder how they’ve coded this baby up …