Dumb down data

By ANIL JOHN on 28 January, 2023 | Permalink

Good decisions can flow from good data. Good data is rich with meaning and relationships. Stripping the value from data to “improve the developer experience” is the wrong way to address the complexity of good data, and enable its use in practice.

The W3C Verifiable Credentials Data Model Standard, which currently is at v1.1, was not developed by big platforms or technology vendors within their silos and then put thru a rubber-stamping “standardization” process via venues they control.

It was developed openly and with global input at the World Wide Web Consortium (W3C) by people and organizations that deeply understood web technologies, and the value of representing data in a manner that is flexible, meaningful and open.

A core aspect of that was to ensure that Linked Data, in particular JSON-LD, is supported as a first class citizen when it comes to representing data within the standard. This is not in any way controversial, given both the value of the data representation and the support and use of it on the open web.

Incumbent allergic reaction

However, as I mentioned in Battle for the brand, the incumbent vendors who are comfortable with the status quo have woken up to the support and global traction for this new standard, and how their current products do not support its power and flexibility i.e. they would have to actually compete in this new marketplace!

As such, they are actively …

Seeking to reopen settled consensus decisions reached over many working group sessions that resulted in the current v1.1 of the W3C VC standard, that will break existing secure, privacy respecting and standards compliant implementations

And a core component that they are trying to break is the first class support for JSON-LD within the standard, that gives the data representation its power and flexibility.

Value of smart data

The challenge in this area tends to be how to convey the both the value of data and the implications of stripping off that value to an audience of folks who are not “data people”.

But lo and behold, as I was checking my reading feeds this week, a blog post by Nis Jespersen, UN Web Vocabulary Project Lead, on how one could See the Global Supply Chain with Knowledge Graphs and UN Web Semantics popped up.

It provides a write-up of a presentation (PDF) and live demo to the UN/CEFACT Forum on Linked Data:

The essential part of my presentation was a live demo, building a supply chain knowledge graph from the bottom up. In doing so, I gradually introduced the full tech stack in play:

The essentials of APIs, JSON and JSON Schema

Adding semantics with JSON Linked Data (JSON-LD)

Building Knowledge Graphs from LD files

The important role of standardized Web Vocabularies

I would highly urge people, particularly those who are skeptical about the value of JSON-LD, to read the write-up and ask questions about it.

For me the take-away was the following:

We did this with literally no manual data mapping. The knowledge graph just “snaps”into place like magnets. This means we can dump massive amounts of JSON-LD files at the graph database. Data can come from different origins, APIs, data schemas, etc — it will all still snap together automatically. […]

Without semantic context, raw data is meaningless. Traditional APIs depend on human intuition and labor to make sense of data. But we have seen how simple it is instead to add an explicit, declarative context and let computers do all the hard work. […]

This out-of-the-box functionality, that is baked into the current version of the W3C Verifiable Credentials Standard to define W3C Verifiable Credentials using JSON-LD as the data representation, is what the incumbents are trying to strip away!

Broader applicability

I fully expect some people, after reading the article, to make the claim that while the use of Linked Data may be relevant to supply chain track and trace scenarios, it is not really relevant to any manner of personal credentials.

I call BS on that!

Let us back up for a moment; there is a tendency within the identity community (of which I consider myself a card carrying member) to pat ourselves on the back with statements such as “Identity is the new perimeter” and “Identity is the gatekeeper”.

But as anyone who is not a technology vendor but is in the actual business of delivering digital services know, identity is but one ingredient that goes into making a decision about benefits delivery and eligibility.

Let me categorically assure you that Anil John, showing up with confirmation that I am indeed Anil John:

Does not allow me to cross the border into a Country
Does not allow me to buy age restricted products
Does not allow me to open a bank account
…

All of these situations require me to either self-assert or have an authoritative entity assert additional information about me, to a skeptical counter-party who will take as input all of those signals/data/information in order to make an (access control / benefits) decision.

And let us also be very real that some of those signals/data/information is going to be wrong, some may be outright lies, and some may be interpreted as being something different than what was intended.

So the ability to provide the counter-party information in a format that allows them to process it holistically, which a data representation such as JSON-LD allows for, is critically important in personal identity/credential use cases as well.

Improving developer experience

What is interesting to me is that the argument that is being used in order to dumb the data down is that plain old JSON data, with no meaning is “easier” for developers to handle and will “improve the developer experience”.

What I would like to see instead of treating capable developers as infants and attempting to reduce the value of the data they need to work on for their customers, is investment in the tooling that is provided to developers to improve their comprehension and understanding of the rich and meaningful data they now have access to, such that they can provide value to their customers!

Yes, it would certainly cut into the current business model of the incumbents in catering to the lowest common denominator of data, and then foisting consulting services and data enrichment products on the same customers to increase the fidelity of the data to make it usable in business processes!

Unfortunately the current path the incumbents are on is one in which they are limiting the choice that is available to customers, and implicitly telling them that they know better than the customer how data should be presented and used.

Needless to say, that is not a long term path to success for them in an open, competitive ecosystem!

newRecently: Antidote to enshittification

The lifecycle of a profit-driven multi-sided platform is explained in this article by Cory Doctorow, and a new verb that describes the process has entered my person lexicon:

Here is how platforms die: first, they are good to their users; then they abuse their users to make things better for their business customers; finally, they abuse those business customers to claw back all the value for themselves. Then, they die. I call this enshittification [...]

The temptation to enshittify is magnified by the blocks on interoperability ...
Cory Doctorow

In a follow-up to the above article, Mike Masnick adds time horizon as an important aspect to consider when you hear about “maximizing shareholder value”:

There are ways around this, but it’s not easy. Cory and I push for interoperability (including adversarial interoperability) because we know in the long run it actually makes things better for users, and creates incentives for companies and services not to treat their users as an endless piggybank that can be abused at will. Cory frames it as a “freedom to exit.”
Mike Masnick

These two, separate but connected, thought pieces made a critical connection for me within the context of Interoperability.

If interoperability (including adversarial interoperability) is the anti-dote to enshittification, it makes sense that platforms and incumbent vendors with market share to lose will actively try to subvert the process of developing and defining open standards that ensure that interoperability

And they will also try to divert, subvert and appropriate any potential technology, that they were not involved in developing and controlling, that has the promise of true, global interoperability.

cyberLinks: random and relevant

Adversarial Interoperability - “For a really competitive, innovative, dynamic marketplace, you need adversarial interoperability: that’s when you create a new product or service that plugs into the existing ones without the permission of the companies that make them.”
Protocols, Not Platforms: A Technological Approach to Free Speech - “This article proposes an entirely different approach—one that might seem counterintuitive but might actually provide for a workable plan that enables more free speech, while minimizing the impact of trolling, hateful speech, and large-scale disinformation efforts. As a bonus, it also might help the users of these platforms regain control of their privacy. And to top it all off, it could even provide an entirely new revenue stream for these platforms. That approach: build protocols, not platforms.”
Usage statistics of JSON-LD for websites - “These diagrams show the usage statistics of JSON-LD as structured data format on the web. […] JSON-LD is used by 44.4% of all the websites.”