Census faces privacy issues

WASHINGTON — Census block 1002 in downtown Chicago is wedged between Michigan and Wabash avenues, a glitzy Trump-branded hotel and a promenade of cafes and bars. According to the 2020 census, 14 people live there – 13 adults and one child.

Also according to the 2020 census, they live under water. Because the block consists entirely of a 700 foot bend in the Chicago River.

If that seems impossible, well, it is. The Census Bureau itself claims that the figures for Block 1002 and tens of thousands of others are unreliable and should be ignored. And he should know: the office’s own computers moved these people there so they couldn’t be traced to their real residences, all in a radical new effort to preserve their privacy.

This paradox is at the heart of a debate that is rocking the Census Bureau. For one thing, federal law requires that census records remain confidential for 72 years. This safeguard has been crucial in persuading many people, including non-citizens and members of racial and ethnic minority groups, to voluntarily disclose personal information.

On the other hand, thousands of entities – local governments, businesses, advocacy groups and more – have relied on the office’s goal of counting “every person, only once, in the right place” to shed light on countless demographic decisions, from drawing political maps to planning. disaster response to place bus stops.

The 2020 census breaks this assumption. Now the bureau says its legal mandate to protect the identities of census respondents means that some data from the smaller geographic areas it measures — census blocks, not to be confused with city blocks — must be scrutinized. sideways, or even ignored.

And the consumers of this data are unhappy.

“We understand that we need to protect the privacy of individuals, and it’s important that the office does that,” wrote David Van Riper, a manager at the University of Ottawa’s Institute for Social Research and Data Innovation. Minnesota, in an email. “But in my view, producing poor quality data to ensure privacy is defeating the purpose of the decennial census.”

At issue is a mathematical concept called differential privacy that the bureau is using for the first time to obfuscate 2020 census data. Many consumers of census data say they not only produce nonsensical results like Block 1002, but that they could also limit the publication, for reasons of confidentiality, of the basic information on which they are based.

They are also vexed by its implementation. Most major census changes are tested for up to a decade. Differential privacy was implemented within a few years, and data releases already hampered by the pandemic were further delayed by privacy changes.

Census officials call such concerns exaggerated. They have made urgent efforts to explain the change and adjust their privacy mechanisms to handle complaints.

But at the same time, they say that the radical changes brought about by differential privacy are not only justified but also inevitable given the threat to privacy, confusing or not.

“Yes, block-level data has these impossible or unlikely situations,” Michael B. Hawes, senior adviser for data access and privacy at the office, said in an interview. “It’s by design. You might consider it a feature, not a bug.

And that’s the point. For the career data nerds who are the stewards of the census, uncertainty is a statistical fact of life. For their clients, the images of census blocks with homes with no people, people without homes, and even people living underwater proved indelible, as if the curtain had been pulled back on a demographic Great Oz.

“They shattered the illusion – an illusion that everyone thought these point estimates were always good enough or the best possible,” said danah boyd, (lowercase is her choice) a technology expert who co-authored a study on the privacy debate. “Census Bureau leaders have known for decades that this small-area data has all kinds of problems.”

The difference now, she says, is that everyone knows it too.

A bit of history: Census blocks – there are 8,132,968 of them – started over a century ago to help cities better measure their population. Many are actual city blocks, but others are larger and irregularly shaped, especially in suburban and rural areas.

For decades, the Census Bureau withheld most blocks of data for privacy reasons, but relented as the demand for hyperlocal data grew insatiable. A turning point came in 1990: census blocks expanded across the country and the census began to ask detailed questions about race and ethnicity.

This extra detail allowed outsiders to reverse-engineer census statistics to identify specific respondents — in, say, a census block with an Asian American single mother. The bureau covered those leads by swapping those easily identifiable respondents between census blocks, a practice called swapping.

But by the 2010 census, explosions of computing power and business data had broken through that guardrail. In an analysis, the bureau found that 17% of the country’s population could be pieced together in detail – revealing age, race, gender, household status and more. – merging census data with even poor databases of information such as names and addresses.

Today, “any computer science undergraduate could do a reconstruction like this,” Hawes said.

The solution for the 2020 census, differential privacy, which is also used by companies like Apple and Google, applies computer algorithms to the entire census data rather than modifying individual blocks. The resulting statistics have “noise” – computer-generated inaccuracies – in small areas like census blocks. But the inaccuracies fade when the blocks are merged into a cohesive whole.

The change brings distinct benefits to the Census Bureau. Although swapping is a crude way to hide data, differential privacy algorithms can be tuned to meet specific privacy needs. Additionally, the desktop can now tell data users roughly how much noise it generated.

In the eyes of data scientists, census block statistics have always been inaccurate; it’s just that most users didn’t know about it. From this perspective, differential privacy makes census counts more accurate and transparent – not less.

Foreigners see things differently. A Cornell University analysis of the most recent data release in New York State concluded that one in eight census blocks was a statistical outlier, including one in 20 with houses but no people, one in 50 with people but no houses and one in 100 with only those under 18.

These anomalies will decrease as the algorithms are refined and new datasets are released. Some experts say they still fear the numbers are unusable.

Some civil rights advocates worry that noisy block data could make it harder to draw political boundaries under the Voting Rights Act’s provisions for minority representation, though others see no problem. Some experts who draw political maps say they have struggled with the new data.

The block anomalies weren’t a problem in large districts, but they “were causing real havoc in city council wards,” said Kimball Brace, whose company, Election Data Services, primarily serves Democratic clients.

Critics also worry that the bureau is limiting the release of some important statistics only to larger areas like counties because census block numbers are unreliable.

Mr. Hawes, the office’s privacy officer, said it could happen. But because the differential privacy restrictions are adjustable, “we’re adding other lower-level geographic tables based on the feedback we’ve received,” he said.

Such openness is a major change in an agency where confidentiality is a mantra. The shift to differential privacy might be less difficult if the office better answered a basic question: “Since there is so much data commercially available, why do we care about protecting census data?” said Jae June Lee, a data scientist at Georgetown University who advises civil rights groups on the change.

The answer, said Cynthia Dwork, a computer scientist at Harvard University and one of the four inventors of differential privacy, is that a new era of rampant technology and growing intolerance has made privacy constraints more important than never.

Loosen them, she said, and census data could reveal subsidized housing tenants taking in unauthorized boarders to make ends meet. Or the data could be used by hate groups and the politicians who echo it to target people who don’t conform to their preferences.

“Imagine some kind of militarization, where someone decides to make a list of all the gay households across the country,” she said. “I expect there are people who would write the software to do that.”

Comments are closed.