OpendataOn March 30th twenty economists, technologists, and government officials (Download Participant List) convened in person and by telephone at the Sloan Foundation in New York to discuss creating an open numbering scheme and platform to facilitate the comparison of data about organizations across levels of government and agencies in order to:

  • Promote greater accountability and compliance;
  • Enhance economic growth and innovation; and
  • Enable research on the evolution of companies and organizations.

This ORGPedia project is convening a wide range of experts to inform the design and scope of:

  • An open legal identifier system to enable datasets about companies to be compared. Currently, different agencies use different numbering schemes. An open ID will enable taxonomies to “talk” to one another.
  • An online platform to mash up and visualize authenticated government datasets already collected about firms and organizations pursuant to statute or regulation.
  • An Application Programming Interface (API) and supporting software libraries to make it easy for third parties to incorporate ORGPedia into their own systems.
  • A community to encourage public participation in reviewing, annotating and contributing to collected government data whether by companies and organizations or by third parties.

ORGPedia is an experiment in designing an information system that effectively combines authenticated government data with user-contributed information – a hybrid wiki – to enhance public understanding about organizations and firms.

During the March 30th discussion, participants provided their thoughts on the opportunities, challenges, and strategies for implementation, including ideas for how to prototype and pilot a first phase of the system, from the perspective of government and research communities.

This is the first in a series of five planned workshops. The Sunlight Foundation will host a second meeting on April 8th to focus on issues of corporate accountability and compliance. There will be subsequent meetings focused on the needs of those businesses who consume business intelligence; the technology design; and the international opportunities and implications.

For a longer description of ORGPedia see this backgrounder (HTMLDownload PDF).

The following are notes summarizing the discussion from the March 30th Meeting:


There are 18 million registered legal entities in the United States. Having the ability to compare and track data about them would make it possible to:

  • Compare datasets about legal entities across regulatory regimes and states
  • Track changes in control and ownership

In order to make information more transparent to the public; facilitate information sharing across agencies and states; and streamline regulatory compliance by pre-populating information requests with information about entities.

Imagine if, as with the Encyclopedia of Life, which creates a page for every organism on earth, we had a system with a page for every legal entity on earth.  Imagine if we had an “ISBN number” for every entity. It would enable all kinds of new services and research. This has become possible in the last few years as a result of advances in web technology and policies for opening up access to public data. The challenge is that firms evolve faster than fish and firms can morph into new firms with different names and owners through changes in control.

At root, we must address the fundamental microeconomic problem of identifying the boundaries of the firm. What if Adam Smith’s pin factory had a financing arm? Or an exclusive steel supplier? We now have the technology to represent these relationships and make the transparent.

Benefits to Government:

Having stable, unique identifier system by means of a single number or a data dictionary to translate across numbering schemes (or both – a single entity identifier plus a way to translate other common fields across schemes) would enable comparison of corporate activity across levels of government, states and across agencies.  Right now we don’t know if a company doing business in one state is the same or related to a company doing business in another state. So when malfeasance is committed in one place, we are missing an opportunity to be on the look out before it happens in another state. It would be incredibly valuable to have a way to generate early warning signals.

Having a unique identifier or the ability to pull data from a common and authenticated collection of data about an entity would reduce the transaction costs to entities wishing to comply with requirements across multiple states.

The federal government alone spends $3.5 trillion. Public should be able to slice and dice. In order to make the information about how government spends accessible to people, we need to be able to trace this money even when companies change ownership and name. For example, when Boeing acquires McDonnell Douglas, a search today does not connect these two entities to provide an accurate picture.

Even though we track to the subcontractor level, we have none of the history to connect affiliates and see relationships.

This makes having a unique identifier a priority. If we had the ability to trace changes such as mergers, we could better understand the connection, if any, between government grants/contracts and campaign contributions; we could spot fraud and remove offending companies from the rolls across agencies.

Some discussion about needing a level of private information, especially about the individuals involved, even as we maintain public information at the entity level.

Benefits For Researchers:

Think about scholars working with firm as unit of analysis – engaging in same redundant transaction costs – cries out for public data set.

There are huge transaction costs associated with doing work about firms. Data sets tends to be proprietary, limited in scope and the info is at best outdated and, at worst, just terrible.

Accounting, business strategy, information technology management, finance, political science scholars are all engaging in the same socially wasteful redundant activity of trying to clean and match this data. If we could free up some of the time spent on cleaning data, we would free up researcher capacity.

For example, NYTimes did Pulitzer Prize piece on worker death at a manufacturing firm. It was tremendously labor intensive and next to impossible, to investigate the environmental compliance record of the same entity, though preliminary analysis showed they were turning in the same topic release statements to regulators each year rather than developing new figures.

If we wanted to “mash up” OSHA compliance data with EPA compliance data, we can’t do it today. Researchers have the interest but the incompleteness makes it so hard.

Over 50% of the business outputs in the United States are coming from intangibles. But there is no way to match up firms with IP output because we can’t connect patent registrations to the registrations to the entities that hold IP.  At a time when innovation is becoming more important as a driver of the economy, this work is more important not less.

The field of business history is dying off because of difficulty of doing empirical research.


Technologically, this problem is not unlike the naming issues we face today in trying to create websites (or banking codes) to identify entities, ie. and we’re now trying to make sense of the secondary pages like the About page, address page etc. which search engines know how to do.

We have the ability to map when a firm is taken over, complex interdependencies, who owns what.

Visualizations will help make this data more usable. We can show where data came from, whether it is authenticated government data, or contributed by the public.

The technology platforms for building this kind of site exists. There are no show stoppers. Some work will be needed at the applied research level to transition technology from research to practice but there are existing models.

The Encyclopedia of Life, funded by Sloan, provides some important organizational lessons learned about running a system of this type and complexity with a mix of authoritative and open information.


Adding a signal field to existing identifier systems (ie. a universal identifier) might not be hard. Adding several fields to track changes in control, however, could be costly. However, there are Web technologies that can mitigate most of this cost if properly deployed.

What is the right role of the government? Should the government own such a system or should it be a stand-alone non-profit? What is the right governance structure to ensure legitimacy?

Pilot and Partners

Three areas of focus for potential pilot/prototype came up:

  • Mashing up Environment and Labor enforcement databases
  • Mashing up SEC’s XBRL data about public companies with state registrations to track and display changes in ownership
  • Mashing up patent office applications with state corporate registrations to see who is patenting what

The National Organization of Secretaries of State would be a natural partner for implementing the necessary changes.

Also check out B-Lab, a younger, more entrepreneurial set of companies committed to social benefit who might be willing to test contributing more of their data to be used in a pilot.

Check out: Bottega and Powell, Creating a Linchpin for Financial Data: Toward a Universal Legal Entity Identifier.

Check out: UK Companies House, which does impose an LEI but would benefit from the win/win of gains to companies and transparency of getting companies to share their data through such a platform. There will be a June/July paper on corporate reporting.

Check out the book: The Demography of Corporations