last time,"My Number and Privacy: Requirements for Identifiers" and wrote the following:
Next, as a hypothetical example, let's consider a common number for taxes and social security (including pensions). This would need to meet the requirements of both the basic pension number and the taxpayer number. In other words, it would require multiplying the "long term" required by pension management with the "wide range" required by taxes. In other words, an identifier that is "stable over a wide range for a long period of time" is required. This does not meet requirement 3, so it is a bad combination.
This time, I would like to use this as a basis to think about the characteristics of the risks posed by identifiers.
Identifier types
Now, before we get into the main topic, let's review the definition of an identifier and look at its types.
identifierA combination of attributes that can uniquely distinguish (identify) an individual (or thing) from others in a group.
That's right. Not only what we normally think of as an "individual number," but also things like "name, gender, date of birth, and address" (the four basic pieces of information) are identifiers.
In classifying these, we will use three categories: "scope of use," "period of use," and "reusability." "Scope of use" refers to how many people, companies, organizations, etc. use the identifier, "period of use" refers to how long the identifier will be used, and "reusability" refers to whether the identifier can be reused for other people.
- Omnidirectional Identifier (omnidirectional identifier): An identifier that is used regardless of the other party
- Unidirectional Identifier (directional identifier, sectoral identifier): An identifier used only within a relationship with another party
- Persistent Identifiers (persistent identifier): An identifier that remains the same for a long period of time, usually as long as the entity exists.
- Short-term Identifiers (ephemeral identifier): An identifier that changes over a short period of time.
- Reusable Identifiers (reassignable identifier): An identifier that can be reused for other entities.
- Non-reusable identifiers (non-reassignable identifier): An identifier that cannot be reused.
Identifier matching risk captured by period x range
The risk of privacy violation can be caused by a variety of things. For example, "information leaks" that are often reported in newspapers are a typical example. However, here we will focus on the "Nayoro risk" as the main one. This is because, in reality, it is not the "information leak" itself that causes actual damage related to privacy, but rather the damage that occurs when the leaked information is "linked" with other information (for example, prior knowledge held by the victim's acquaintances), resulting in the formation of an "image of the person" that the person does not want. Even if information is leaked, there is almost no risk if it sinks into the ocean without being seen by anyone.
One aspect of the risk to Nayoro due to identifiers can be captured in terms of period x area range, as shown in the following figure.

The horizontal axis is the period during which the identifier is in circulation, and the vertical axis is the range in which the identifier is in circulation. The period refers to how long the identifier will be used, and the range refers to how far its use will spread. For example, web cookies are only used on the site they are in, so their range of circulation can be said to be extremely narrow. So-called sectoral identifiers can also be said to be narrow, as they are only used in that sector. On the other hand, US SSNs and the like are used everywhere, so the range of identifiers in circulation is wide.
Figure 1 shows an identifier that has a long circulation period but a narrow circulation range. Pension numbers are an example of this type of identifier. Figure 2 shows the opposite, an example of an identifier that has a short circulation period but a wide circulation range. For example, tax numbers that change every year are an example of this type of identifier.

The risk of using these identifiers together is the sum of the risks of each.

On the other hand, if the identifiers are integrated, as shown in the following figure, the risks will increase compared to when they are used in combination without integration. If the purpose of use is limited to the respective uses that they had when they were separate, the benefits will not change, so this increase can be said to be an increase in risk without benefit. We will call this excess risk.

The reason why we said that "a common number for taxes and pensions is a bad idea" is because of the risk of excess use like this.
Cost of collaboration
So why do people promote a system that generates such excessive risk? I think it is probably because "collaboration requires costs." There are two types of costs: operational costs and system costs.
First, regarding operational costs, will these increase if we collaborate?
Originally, operations were carried out using individual numbers/identifiers. If so, I don't think that costs would increase by linking them. All you need to do is set the "scope" appropriately. (This is why it is said that the definition of the sector is important in the sectoral system.)
Next, let's look at system costs. Certainly, once integrated, there will be no need to convert different identifiers, so the cost will be cheaper. So, how much cheaper will it be? Let's calculate the conversion cost.
Doing aes-128 cbc for 3s on 16 size blocks: 21715758 aes-128 cbc's in 3.00s
Conclusion
- Combining identifiers with different characteristics creates a significant risk of over-matching.
- The costs of collaboration are so low that they do not justify the excess risks.
- Therefore, a common tax and pension number would not make sense.
(footnote)
- Actually, you should use AEAD algorithm such as GCM instead of CBC. CBC requires integrity, and the cost is not insignificant. With hmac(md5), you can only do about 240 million operations per second per CPU.
- In fact, it would be much cheaper than implementing various security measures to control excess risk.
- Well, when it comes to the number system, we need to discuss what we want to achieve, that is, the "purpose," before considering the risks. If the purpose is not decided, it is not clear whether the risk is worth taking. For example, I think we need a clear "purpose," such as reducing administrative procedure costs by half.
I found it an interesting read.
I think it would be good to mention that the business costs of starting a linkage include the cost of creating a mapping table between identifiers. Furthermore, since the number/identifier is personal information when trying to linkage, it seems necessary to obtain the individual's consent to use it for linkage, and I feel that this business cost is very, very large.
That said, in the case of the former, as long as we have standardized name, date of birth, and street name of the address, the existence of a common number does not seem to significantly reduce the burden of creating the mapping table, and in the case of the latter, it is not about a common number in the first place, so if you think about it carefully, it seems that the conclusion is essentially the same.
The cost of mapping between identifiers is roughly the same whether it is an integrated identifier or a linked identifier. Therefore, it is not included in the discussion here. For information on mapping, please see a separate entry (Regarding the storage and linking of common numbers/My Numbers) http://www.sakimura.org/2012/03/1558/ ) for more details.
Regarding the comment, "Moreover, if you try to link the numbers/identifiers, they are personal information, so it seems necessary to obtain the individual's consent as to whether or not it is OK to use them for linking, and I feel that this operational cost will be very, very large," the conclusion is that this is an irrelevant argument. The consent required is exactly the same whether it is an integrated identifier or an integrated identifier. In fact, the request for consent for an integrated identifier is even stronger because it is "sharing." And, in terms of the My Number system, they are trying to solve this by "law = social consent."
So, essentially nothing changes; neither the conclusion nor the process of the discussion.
Of course, I have taken into consideration what you have said when writing this.
Instead, I would like to ask: why did you think consent was not required when using a unified identifier?
The reason why I thought that consent was not required when using the integrated identifier was because, in my mind, the integrated identifier and the My Number Bill were perceived as one and the same.
① Legislation to eliminate the need for integrated identifiers and consent
② Integrated identifier (no legislation exempting consent)
3) Legislation to eliminate the need for linked identifiers and consent
④ Linked identifiers (no legislation exempting consent)
Of these, only ① and ④ came to mind. ② and ③ were not even in my head.
That's how I started writing my comment, but while I was writing, I was vaguely thinking about the above (though I hadn't organized my thoughts that well), and I ended up writing with the conclusion that "essentially the conclusion doesn't change." This may sound like I'm defending myself, but I think many people are just as unaware of option ③.
Regarding the mapping cost between identifiers, the mechanism of the linked link platform was not immediately apparent from the above context, and the only possible options for "number" were the integrated identifier and the linked identifier.
① Linking codes using an integrated identifier (number)
② Linking using the integrated identifier (number)
③ Linking by mapping between linking identifiers without using codes using linking identifiers (numbers)
④ Linking by code using a linking identifier (number)
I thought of it as a comparison between ② and ③ among the options above. If ② is naturally excluded to reduce privacy risks, the rest require some kind of mapping, so I understand that there is not much difference in terms of workload.
Unlike the premise of the discussion linked to above, we are assuming that, rather than relying on the four basic pieces of information, once the number, whether the integrated identifier or the linked identifier, is known, the code can be easily obtained from the linked platform.
Having considered the above, I am quite unsure as to whether my understanding of the statement "the cost of mapping between identifiers is roughly the same whether they are integrated identifiers or linked identifiers" is correct.
Even if you choose option ②, you still need to map the integrated identifier to an existing account, so there will be costs involved.
understood.
Thank you very much.
I would be happy if you could write an explanatory article on the issues I have been wondering about at some point.
I think most people in the world misunderstand that they will be able to easily link things together if they don't do something complicated like introducing an integrated identifier (common number) and converting the code on the linking platform. This is far removed from the intuition of ordinary people, so I feel like unless I explain it very well, people won't be convinced.
It may or may not be a high priority, and I'm sure you're busy, so please just ignore it.