XML is notoriously expensive to properly parse in many languages. Basically, the entire world centers around 3 open source implementations (libxml2, expat and Xerces), if you want to get anywhere close to actual compliance. Even with them, you might hit challenges (libxml2 was largely unmaintained recently, yet it is the basis for many bindings in other languages).
The main property of SGML-derived languages is that they make "list" a first class object, and nesting second class (by requiring "end" tags), and have two axes for adding metadata: one being the tag name, another being attributes.
So while it is a suitable DSL for many things (it is also seeing new life in web components definition), we are mostly only talking about XML-lookalike language, and not XML proper. If you go XML proper, you need to throw "cheap" out the window.
Another comment to make here is that you can have an imperative looking DSL that is interpreted as a declarative one: nothing really stops you from saying that
means exactly the same as the XML-alike DSL you've got.
One declarative language looking like an imperative language but really using "equations" which I know about is METAFONT. See eg. https://en.wikipedia.org/wiki/Metafont#Example (the example might not demonstrate it well, but you can reorder all equations and it should produce exactly the same result).
alexpetros 29 minutes ago [-]
Author here. I agree with all this, and I think it's important to note that nothing precludes you from doing a declarative specification that looks like imperative math notation, but it's also somewhat besides the point. Yes, you could make your own custom language, but then you have created the problem that the article is about: You need to port your parser to every single different place you want to use it.
That's to say nothing of all the syntax decisions you have to make now. If you want to do infix math notation, you're going to be making a lot of choices about operator precedence. The article is using a lot of simple functions to explain the domain, but we also have switch statements—how are those going to expressed? Ditto functions that don't have a common math notation, like stepwise multiply. All of these can be solved, but they also make your parser much more complicated and create a situation where you are likely to only have one implementation of it.
If you try to solve that by standardizing on prefix notations and parenthesis, well, now you have s-expressions (an option also discussed in the post).
That's what "cheap" means in this context: There's a library in every environment that can immediately parse it and mature tooling to query the document. Adding new ideas to your XML DSL does not at all increase the complexity of your parsing. That's really helpful on a small team! I agonized over the word "cheap" in the title and considered using something more obviously positive like "cost-effective" but I still think "cheap" is the right one. You're making a cost-cutting choice with the syntax, and that has expressiveness tradeoffs like OP notes, but it's a decision that is absolutely correct in many domains, especially one where you want people to be able to widely (and cheaply) build on the thing you're specifying.
johnbarron 21 minutes ago [-]
Why did you hardly engaged in the article on the subject of schema driven validation?
alexpetros 14 minutes ago [-]
This is a good question! We do it, it works, and it's definitely an advantage of XML over alternatives. I just personally haven't had the time to dig in and learn it well enough to write a blog post about it. In practice I think people update the Fact Dictionary largely based on pattern matching, so that's what I focused on here.
Someone1234 26 minutes ago [-]
I keep seeing people make the same mistake as XML made over and over; without learning from it. I will clarify the problem thusly:
> The more capabilities you add to a interchange format, the harder that format is to parse.
There is a reason why JSON is so popular, it supports so little, that it is legitimately easy to import. Whereas XML supports attributes, namespaces, CDATA, DTDs, QNames, xml:base, xml:lang, XInclude, etc etc. They gave it everything, including the kitchen sink.
There was a thread here the other day about using Sqlite as an interchange format to REDUCE complexity. Look, I love Sqlite, as an application specific data-store. But much like XML it has a ton of capabilities, which is good for a data-store, but awful for an interchange format with multiple producers/consumers with their own ideas.
CSV may be under-specified, but it remains popular largely due to its simplicity to produce/consume. Unfortunately, we're seeing people slowly ruin JSON by adding e.g. commands to the format, with others than using those "comments" to hold data (e.g. type information), which must be parsed. Which is a bad version of an XML Attribute.
CSTML is my attempt to fix all these issues with XML and revive the idea of HTML as a specific subset of a general data language.
As you mention one of the major learnings from the success of JSON was to keep the syntax stupid-simple -- easy to parse, easy to handle. Namespaces were probably the feature to get the most rework.
In theory it could also revive the ability we had with XHTML/XSLT to describe a document in a minimal, fully-semantic DSL, only generating the HTML tag structure as needed for presentation.
xienze 23 minutes ago [-]
> Whereas XML supports attributes, namespaces, CDATA, DTDs, QNames, xml:base, xml:lang, XInclude, etc etc. They gave it everything, including the kitchen sink.
But you don't have to use all those things. Configure your parser without namespace support, DTD support, etc. I'd much rather have a tool with tons of capabilities that can be selectively disabled rather than a "simple" one that requires _me_ to bolt on said extra capabilities.
catlifeonmars 4 minutes ago [-]
[delayed]
cbm-vic-20 10 minutes ago [-]
As a data interchange format, you can only depend on the lowest commonly implemented features, which for XML is the base XML spec. For example, Namespaces is a "recommendation", and a conformant XML parser doesn't need to support it.
petcat 42 minutes ago [-]
> XML Is a Cheap [...]
> XML is notoriously expensive to properly parse in many languages.
I'm glad this is the top comment. I have extensive experience in enterprise-y Java and XML and XML is anything but cheap. In fact, doing anything non-trivial with XML was regularly a memory and CPU bottleneck.
alexpetros 10 minutes ago [-]
In the context of the article, "cheap" means "easy to set up" not "computationally efficient." The article is making the argument that there are situations in which you benefit from sacrificing the latter in favor of the former. You're right that it's annoyingly slow to parse though and that does cause issues I'd like to fix.
diffuse_l 34 minutes ago [-]
That's if you parse the into a DOM and work on that. If you use SAX parsing, it makes it much better regarding the memory footprint.
But of course, working with SAX parsing is yet another, very different, bag of snakes.
I still hope that json parsing had the same support for stream processing as XML (I know that there are existing solutions for that, but it's much less common than in the XML world)
bubbleRefuge 36 minutes ago [-]
Yup. SAP and their glorious idocs with german acronyms
twoodfin 15 minutes ago [-]
Much of XML’s complexity derives from either the desire to be round-trip compatible with any number of existing character and data encodings or the desire to be largely forward-compatible with SGML.
A parser that only had to support a specified “profile” of XML (say, UTF-8 only, no user-defined entities or DTD support generally) could be much simpler and more efficient while still capturing 99% of the value of the language expressed by this post.
phlakaton 6 minutes ago [-]
That's besides the point of this post. You're welcome to enforce such a profile on your documents, but the point of this post is the ease from throwing the whole ecosystem of out-of-the-box XML tools at it, tools which don't assume any such profile.
(Now ITOT they may have profiles of their own, e.g. where safe parsing, validation, and XSLT support are concerned, but they have a large overlap.)
phlakaton 12 minutes ago [-]
Unless you are compiling really large systems of DSL specification, speed of parsing is not the operation you want to be optimizing. XML for this use case, even if you DOM it, is plenty fast.
What are more concerning are the issues that result in unbounded parses – but there are several ways to control for this.
Hendrikto 7 minutes ago [-]
> XML for this use case, even if you DOM it, is plenty fast.
This mindset is why we have computers now that are three+ orders of magnitude faster than a C64 but yet have worse latency.
phlakaton 3 minutes ago [-]
Interesting you should complain about that with a legacy technology that's almost 30 years old (or 50 years old if you count SGML). In particular, XML has gotten no more complex or slow than it was 20 years ago, when development largely stopped.
For this application it's plenty fast. Even if you've got a Pentium machine.
gchamonlive 8 minutes ago [-]
That's a strange comment...
Cheap here is semantically different from cheap in the article. Here it means "how hard it hits the CPU" and in the article is "how hard it is to specify and widely support your DSL".
You also posted a piece of code that the author himself acknowledged that is not bad and ommited the one pathological example where implementation details leak when translating to JavaScript.
It just seems like you didn't approach reading the article willing to understand what the author was trying to say, as if you already decided the author is wrong before reading.
sriku 50 minutes ago [-]
While this can give a notation for the domain, you'd still need an engine to process it. Prolong+CLPFD perhaps meets it well (not too familiar with the tax domain) and one could perhaps paraphrase Greenspun's tenth rule to this combo too.
necovek 1 hours ago [-]
FWIW, this is also one of the reasons MathML has never become the "input" language for mathematics, and the layout-focused (La)TeX remains the de-facto standard.
Ergonomics of input are important because they increase chances of it being correct, and you can usually still keep it strict and semantic enough (eg. LaTeX is less layout-focused than Plain TeX)
raverbashing 1 hours ago [-]
> and have two axes for adding metadata: one being the tag name, another being attributes
Yes let's not even get started on implementations who do <something value="value"></something>
1a527dd5 9 minutes ago [-]
The trouble with XML has never been XML itself.
It was also about how easy it was to generate great XML.
Because it is complicated and everyone doesn't really agree on how to properly representative an idea or concept, you have to deal with varying output between producers.
I personally love well formed XML, but the std dev is huge.
Things like JSON have a much more tighter std dev.
The best XML I've seen is generated by hashdeep/md5deep. That's how XML should be.
Financial institutions are basically run on XML, but we do a tonne of work with them and my god their "XML" makes you pray and weep for a swift end.
jaen 1 hours ago [-]
Or... you could just use a programming language that looks good and has great support for embedded domain-specific languages (eDSL), like Haskell, OCaml or Scala.
Or, y'know, use the language you have (JavaScript) properly, eg. add a `sum` abstraction instead of `.reduce((acc, val) => { return acc+val }, 0)`.
In particular, the problem of "all the calculations are blocked for a single user input" is solved by eg. applicatives or arrows (these are fairly trivial abstract algebraic concepts, but foreign to most programmers), which have syntactic support in the abovementioned languages.
(Of course, avoid the temptation to overcomplicate it with too abstract functional programming concepts.)
If you write an XML DSL:
1. You have to solve the problem of "what parts can I parallelize and evaluate independently" anyway. Except in this case, that problem has been solved a long time ago by functional programming / abstract algebra / category-theoretic concepts.
2. It looks ugly (IMHO).
3. You are inventing an entirely new vocabulary unreadable to fellow programmers.
4. You will very likely run into Greenspun's tenth rule if the domain is non-trivial.
librasteve 58 minutes ago [-]
Suggest to Raku to that list. All the early Raku devs were Haskell coders (the first Raku parser (PUGS) was written in Haskell).
Since Raku suports both OO and Functional coding styles, and has built in Grammars, it is very nice for DSLs.
kwon-young 8 minutes ago [-]
The article mentions prolog but doesn't mention you can use constraints to fully express his computation graph.
My prefered library is clpBNR which has powerful constraints over boolean, integers and floats:
If you restrict yourself to the pure subset of prolog, you can even express complicated computation involving conditions or recusions.
However, this means that your graph is now encoded into the prolog code itself, which is harder to manipulate, but still fully manipulable in prolog itself.
But the author talks about xml as an interchange format which is indeed better than prolog code...
sgarland 1 hours ago [-]
While a great article, I actually found this linked post [0] to be even better, in which the author lays out how so much modern tooling for web dev exists simply because XML lost the browser war.
EDIT: obviously, JSON tooling sprang up because JSON became the lingua franca. I meant that it became necessary to address the shortcomings of JSON, which XML had solved.
I'm not sure what the author means by "(XML) was abandoned because JavaScript won. The browser won."
The browser supported XML as much as Javascript. Remember that the "X" in "AJAX" acronym stands for XML, as well as "XMLHttpRequest" which was originally intended to be used for fetching data on the fly in XML. It was later repurposed to grab JSON data.
Javascript was not a reason XML was abandoned. It was just that the developer community did not like XML at all (after trying to use it for a while).
As for whether the dev community was "right", it's hard to comment because the article you linked is heavy on the ranting but light on the contextual details. For example it admits that simpler formats like JSON might be appropriate where "small data transfers between cooperating services and scenarios where schema validation would be overkill". So are they talking about people storing "documents" and "files" in JSON form? I guess it happens, but is it really as common to use JSON as opposed to other formats like YAML (which is definitely not caused by Javascript in the browser winning)?
Personally I think XML was abandoned because inherent bad design (and maybe over-engineering). A simpler format with schema checking is probably more ideal IMHO.
skrebbel 1 hours ago [-]
I read both, but I feel like they both miss what it was like to work with APIs back in the bad old XML days.
Yes, XML is more descriptive. It's also much harder for programmers to work with. Every client or server speaking an XML-based protocol had to have their own encoder/decoder that could map XML strings into in-memory data structures (dicts, objects, arrays, etc) that made sense in that language. These were often large and non-trivial to maintain. There were magic libraries in languages like Java and C# that let you map XML to objects using a million annotations, but they only supported a subset of XML and if your XML didn't fit that shoe you'd get 95% of the way and then realize that there was no way you'd get the last 5% in, and had to rewrite the whole thing with some awful streaming XML parser like SAX.
JSON, while not perfect, maps neatly onto data structures that nearly every language has: arrays, objects and dictionaries. That it why it got popular, and no other reason. Definitely not "fashion" or something as silly as that. Hundreds of thousands of developers had simply gotten extremely tired of spending 20% of their working lives producing and then parsing XML streams. It was terrible.
And don't even get me started on the endless meetings of people trying to design their XML schemas. Should this here thing be an attribute or a child element? Will we allow mixing different child elements in a list or will we add a level of indirection so the parser can be simpler? Everybody had a different idea about what was the most elegant and none of it mattered. JSON did for API design what Prettier did for the tabs vs spaces debate.
sgarland 1 hours ago [-]
Since you explicitly mentioned fashion, I assume you read this:
> There is a distinction that the industry refuses to acknowledge: developer convenience and correctness are different concerns. They are not opposed, necessarily, but they are not the same thing.
…
The rationalization is remarkable. "JSON is simpler", they say, while maintaining thousands of lines of validation code. "JSON is more readable", they claim, while debugging subtle bugs caused by typos in key names that a schema would have caught immediately. "JSON is lightweight", they insist, while transmitting megabytes of redundant field names that binary XML would have compressed away. This is not engineering. This is fashion masquerading as technical judgment.
I feel the same way about RDBMS. Every single time I have found a data integrity issue - which is nearly daily - the fix that is chosen is yet another validation check. When I propose actually creating a proper relational schema, or leaning on guarantees an RDBMS can provide (such as making columns that shouldn’t be NULL non-NULLable, or using foreign key constraints), I’m told that it would “break the developer mental model.”
Apparently, the desired mental model is “make it as simple as possible, but then slowly add layer upon layer of complex logic to handle all of the bugs.”
skrebbel 27 minutes ago [-]
My zod schemas are 100x simpler than all those SAX parsers I maintained back in the day. Honestly I kinda doubt you've worked with XML a lot. The XML data model is wildly different than that of pretty much every programming language's builtin data structure, and it's a lot of work to cross that bridge.
The article posted here makes a good point actually. XML is a DSL. So working with XML is a bit like working with a custom designed language (just one that's got particularly good tooling). That's where XML shines, but it's also where so much pain comes from. All that effort to design the language, and then to interpret the language, it's much more work than just deserializing and validating a chunk of JSON. So XML is great when you need a cheap DSL. But otherwise it isn't.
But the article you quoted makes the case that XML was good at more stuff than "lightweight DSL", that JSON was somehow a step back. And believe me, it really wasn't. Most APIs are just that.. APIs. Data interchange. JSON is great for this, and for all its warts, it's a vast, vast improvement over XML.
hnfong 42 minutes ago [-]
In your situation, I would blame the developers, not the tools (JSON) or fashion.
Even if it's fashionable to do the wrong thing, the developer is at fault for choosing to follow fashion instead of doing the right thing.
microtonal 50 minutes ago [-]
The 'much harder for programmers to work with' was that the official way of doing a lot of programming related to XML was to do it in... XML. E.g. transformations were done with XSLT, query processing with XQuery. There were even XML databases that you had to query with XML (typically XQuery).
All these XML DSLs were so dreadful to write and maintain for humans that most people despised them. I worked in a department where semantic web and all this stuff was fairly popular and I still remember remember one colleague, after another annoying XML programming session, saying fuck this, I'll rip out all the XSLT and XQuery and will just write a Python script (without the swearing, but that was certainly his sentiment). First it felt a bit like an offense for ditching the 'correct' way, but in the end everyone sympathized.
As someone who has lived through the whole XML mania: good riddance (mostly).
And don't even get me started on the endless meetings of people trying to design their XML schemas.
I have found that this attracts certain type of people who like to travel to meetings and talk about schemas and ontologies for days. I had to sit through some presentations, and I had no idea what they presented had to do anything, they were so detached from reality that they built a little world on their own. Sui generis.
badgersnake 1 hours ago [-]
It’s the usual case of “I can’t be bothered to learn the complicated thing, give me something simple.” Two years later, “Oh wait, I need more features, this problem is more complicated than I thought”.
sgarland 23 minutes ago [-]
As a devil’s advocate, it is extremely difficult to produce something that’s simple to understand, flexible, and not inherently prone to bugs.
I am not a dev; I’m ops that happens to know how to code. As such, I tend to write scripts more than large programs. I’ve been burned enough by bash and Python to know how to tame them (mostly, rigid insistence on linters and tests), but as one of my scripts blossomed into a 15K LOC monstrosity, I could see in real time how various decisions I made earlier became liabilities. Some of these were because I thought I wouldn’t need it, others were because I later had learned I might need flexibility, but didn’t have the fundamental knowledge to do it correctly.
For example, I initially was only using boolean return types. “It’s simpler,” I thought - either a function works, or it doesn’t, and it’s up to the caller to decide what to do with that. Soon, of course, I needed to have some kind of state and data manipulation, and I wound up with a hideous mix of side effects and callbacks.
Another: since I was doing a lot of boto3 calls in this script, some of which could kick off lengthy operations, it needed to gracefully handle timeouts, non-fatal exceptions, and mutations that AWS was doing (e.g. Blue/Green on a DB causes an endpoint name swap), while persisting state in a way that was crash-proof while also being able to resume a lengthy series of operations with dependencies, only some of which were idempotent.
I didn’t know enough of design patterns to do all of this elegantly, I just knew when what I had was broken, so I hacked around it endlessly until it worked. It did work (I even had tests), but it was confusing, ugly, and fragile.
The biggest technical learning I took away from that project was how incredibly useful true ADTs are, and how languages that have them can prevent entire classes of bugs from ever happening. I still love Python, but man, is it easy to introduce bugs.
jfengel 46 minutes ago [-]
It's not a DSL. It's a generic lexer and parser. It takes the text and gives you an abstract syntax tree. The actual DSL is your spec, and the syntax you apply.
It's one of many equivalent such parser tools, a particularly verbose one. As such it's best for stuff not written by hand, but it's ok for generated text.
It has some advantages mostly stemming from its ubiquity, so it has a big tool kit. It has a lot of (somewhat redundant) features, making it complex compared to other options, but sometimes one of those features really fits your use case.
Sharlin 17 minutes ago [-]
> It evokes memories of SOAP configs and J2EE (it’s fine, even good, if those acronyms don’t mean anything to you).
Heh, a couple of years ago I walked past a cart of free-to-take discards at the uni, full of thousand-page tomes about exciting subjects like SOAP, J2EE and CORBA. I wonder how many of the current students even recognized any of those terms.
DamonHD 10 minutes ago [-]
I used all three of those to some extent, in investment banking back when it was bigger than tech, and while I still have some time for J2EE (WAR in particular), the other two, especially SOAP, should be taught as cautionary tales to the young 'uns...
exabrial 2 hours ago [-]
Given that that is had strong schema XSD verification built in, where you can tell in an instant whether or not the document is correct; it’s the right tool for a majority of jobs.
My experience has been the people complaining about it were simply not using automated tools to handle it. It’s be like people complaining that “binaries/assembly are too hard to handle” and never using a disassembler.
hnfong 33 minutes ago [-]
> can tell in an instant whether or not the document is correct
Speaking of "correctness"... It seems to me people almost never mention that while schema verification can detect a lot of issues, in the end it cannot replace actual content validation. There are often arbitrarily complicated constraints on data that requires custom code to validate.
This is analogous to the ridiculous claim that type checking compilers can tell you whether the program is correct or not.
DamonHD 8 minutes ago [-]
If your type checking was in the Martin-Löf school, and you started with a putative proof that what you wanted to execute was possible, then maybe! B^>
bananamansion 1 hours ago [-]
what jobs require XSD verification?
baq 1 hours ago [-]
Ideally all of them.
n_e 29 minutes ago [-]
After thinking a bit about the problem, and assuming the project's language is javascript, I'd write the fact graph directly in javascript:
This way it's a lot terser, you have auto-completion and real-time type-checking.
The code that processes the graph will also be simpler as you don't have to parse the XML graph and turn it into something that can be executed.
And if you still need XML, you can generate it easily.
Decabytes 1 hours ago [-]
S-expressions are a cheap dsl too. I use it in my desktop browser runtime that is powered by wasm that I’m developing
As the “HTML”^1 and CSS^2 in fact it works so well I use it also reused it to do the styling for html exports in my markup language designed to fight documentation drift^3.
YAML seems like a great middleground here between xml and json..
DamonHD 6 minutes ago [-]
I have worked with a lot of langauges over decades including YAML, and I regard it as one of the worst that I have tangled with for a number of reasons...
IshKebab 56 minutes ago [-]
YAML is never a great anything.
47 minutes ago [-]
librasteve 1 hours ago [-]
I have been playing with DSLs a little, here is the kind of syntax that I would choose:
invoice "INV-001" for "ACME Corp"
item "Hosting" 100 x 3
item "Support" 50 x 2
tax 20%
invoice "INV-002" for "Globex"
item "Consulting" 200 x 5
discount 10%
tax 21%
In contrast to XML (even with authoring tools), my feeling is that XML (or any angle-bracket language tbh) is just too hard to write correctly (ie XML syntax and XMl schema parsing is very unforgiving) and has a lot of noise when you read it that obscures the main intent of the DSL code.
librasteve 1 hours ago [-]
Here's how the built-in Raku Grammar can be used to parse this. I can see Raku generating the XML as the Actions from this Grammar so allow ease of DSL authoring with XML as a interchange and strict scheme validation format.
This looks fun but I’d rather have the free direct filing service they discontinued.
sdovan1 1 hours ago [-]
Sometimes I wonder why we need to invent another DSL. (or when should we?)
At work, we have an XML DSL that bridges two services. It's actually a series of API calls with JSONPath mappings.
It has if-else and goto, but no real math (you can only add 1 to a variable though) and no arrays.
Debugging is such a pain, makes me wonder why we don't just write Java.
wild_pointer 2 hours ago [-]
It's not so cheap, in terms of maintenance and mental load
dale_glass 46 minutes ago [-]
It kinda blows my mind that after XML we've managed to make a whole bunch of stuff that's significantly worse for any serious usage.
JSON: No comments, no datatypes, no good system for validation.
YAML: Arcane nonsense like sexagesimal number literals, footguns with anchors, Norway problem, non-string keys, accidental conversion to a number, CODE INJECTION!
I don't know why, but XML's verbosity seems to cause such a visceral aversion in a lot of people that they'd rather write a bunch of boring code to make sure a JSON parses to something sensible, or spend a day scratching their head about why a minor change in YAML caused everything to explode.
Actually my own problem with XML was annoyance that back when I had the thought of doing a complex config format in XML, the idea of modifying it programmatically while retaining comments turned out to be absolutely non-trivial. In comparison with the mess one can make with YAML that's just a trivial thing.
ACCount37 41 minutes ago [-]
"Any serious usage" starts at "it just works".
JSON just works. Every language worth giving a damn about has a half-decent parser, and the syntax is simple enough that you can write valid JSON by hand. You wouldn't hit the edgy edge cases or the need to use things like schemas until down the line, by which point you're already rolling with JSON.
XML doesn't "just work". There are like 4 decent libraries total, all extremely heavy, that have bindings in common languages, and the syntax is heavy and verbose. And by the time you could possibly get to "advanced features that make XML worth using", you've already bounced off the upfront cost of having to put up with XML.
Frontloading complexity ain't great for adoption - who would have thought.
n_e 22 minutes ago [-]
> JSON: No comments, no datatypes, no good system for validation.
I don't agree at all. With tools like Zod, it is much more pleasant to write schemas and validate the file than with XML. If you want comments, you can use JSON5 or YAML, that can be validated the same way.
scotty79 9 minutes ago [-]
How awesome would XML be if it didn't have attributes, namespaces and could close elements with </>
dndn2 2 hours ago [-]
I think the declarative calculations part is important, why I made calculang https://calculang.dev
Will check later if there's some interesting calcs here to transpose, I'm for more models being published by public bodies!
1 hours ago [-]
cl0ckt0wer 1 hours ago [-]
The subtext here is that XML is a powerful tool when generating code with LLMs
mikkupikku 51 minutes ago [-]
Yeah, but you get what you pay for.
thatwasunusual 1 hours ago [-]
It's completely unbelievable that so-called developed countries are struggling with this in 2026.
In Norway, we've had a more or less automated tax system for many years; every year you get a notification that the tax settlement is complete, you log in and check if everything is correct (and edit if desired) and click OK.
It shouldn't be more difficult than this.
bdangubic 1 hours ago [-]
how much do Norway tax preparation companies spend on lobbying Norway Politicians each year? :)
thatwasunusual 1 hours ago [-]
Having a proper system for handling citizens' main priorities is important. What happens in 3rd world countries is a struggle that UN++ needs to focus on.
bdangubic 16 minutes ago [-]
In the US of A not even health is main priority. Tax prep would not crack top-50
raverbashing 1 hours ago [-]
Honestly let's leave XML in that 90s drawer from where it should have never left
rkomorn 1 hours ago [-]
XML is one of those things that fulfills the requirements I have but makes me say "not like this..."
baq 1 hours ago [-]
XML is better than yaml.
…note this doesn’t really say much. Both are terrible.
llm_nerd 1 hours ago [-]
XML is fantastic. XML with XSD and XSL(T) was godly for data flow systems. I mean, just having a well defined, verifiable date type was magical and something seemingly unfathomable for so many other formats.
What hurt XML was the ecosystem of overly complex shit that just sullied the whole space. Namespaces were a disaster, and when firms would layer many namespaces into one use it just turned it into a magnificent mess that became impossible to manually generate or verify. And then poorly thought out garbage specs like SOAP just made everyone want to toss all of it into the garbage bin, and XML became collateral damage of kickback against terrible standards.
jgalt212 1 hours ago [-]
> Tax logic needs a declarative specification
preach. I'm convinced there are cycles in the tax code that can be exploited for either infinite taxes or zero taxes. Can Claude find them?
Rendered at 14:26:40 GMT+0000 (Coordinated Universal Time) with Vercel.
The main property of SGML-derived languages is that they make "list" a first class object, and nesting second class (by requiring "end" tags), and have two axes for adding metadata: one being the tag name, another being attributes.
So while it is a suitable DSL for many things (it is also seeing new life in web components definition), we are mostly only talking about XML-lookalike language, and not XML proper. If you go XML proper, you need to throw "cheap" out the window.
Another comment to make here is that you can have an imperative looking DSL that is interpreted as a declarative one: nothing really stops you from saying that
means exactly the same as the XML-alike DSL you've got.One declarative language looking like an imperative language but really using "equations" which I know about is METAFONT. See eg. https://en.wikipedia.org/wiki/Metafont#Example (the example might not demonstrate it well, but you can reorder all equations and it should produce exactly the same result).
That's to say nothing of all the syntax decisions you have to make now. If you want to do infix math notation, you're going to be making a lot of choices about operator precedence. The article is using a lot of simple functions to explain the domain, but we also have switch statements—how are those going to expressed? Ditto functions that don't have a common math notation, like stepwise multiply. All of these can be solved, but they also make your parser much more complicated and create a situation where you are likely to only have one implementation of it.
If you try to solve that by standardizing on prefix notations and parenthesis, well, now you have s-expressions (an option also discussed in the post).
That's what "cheap" means in this context: There's a library in every environment that can immediately parse it and mature tooling to query the document. Adding new ideas to your XML DSL does not at all increase the complexity of your parsing. That's really helpful on a small team! I agonized over the word "cheap" in the title and considered using something more obviously positive like "cost-effective" but I still think "cheap" is the right one. You're making a cost-cutting choice with the syntax, and that has expressiveness tradeoffs like OP notes, but it's a decision that is absolutely correct in many domains, especially one where you want people to be able to widely (and cheaply) build on the thing you're specifying.
> The more capabilities you add to a interchange format, the harder that format is to parse.
There is a reason why JSON is so popular, it supports so little, that it is legitimately easy to import. Whereas XML supports attributes, namespaces, CDATA, DTDs, QNames, xml:base, xml:lang, XInclude, etc etc. They gave it everything, including the kitchen sink.
There was a thread here the other day about using Sqlite as an interchange format to REDUCE complexity. Look, I love Sqlite, as an application specific data-store. But much like XML it has a ton of capabilities, which is good for a data-store, but awful for an interchange format with multiple producers/consumers with their own ideas.
CSV may be under-specified, but it remains popular largely due to its simplicity to produce/consume. Unfortunately, we're seeing people slowly ruin JSON by adding e.g. commands to the format, with others than using those "comments" to hold data (e.g. type information), which must be parsed. Which is a bad version of an XML Attribute.
CSTML is my attempt to fix all these issues with XML and revive the idea of HTML as a specific subset of a general data language.
As you mention one of the major learnings from the success of JSON was to keep the syntax stupid-simple -- easy to parse, easy to handle. Namespaces were probably the feature to get the most rework.
In theory it could also revive the ability we had with XHTML/XSLT to describe a document in a minimal, fully-semantic DSL, only generating the HTML tag structure as needed for presentation.
But you don't have to use all those things. Configure your parser without namespace support, DTD support, etc. I'd much rather have a tool with tons of capabilities that can be selectively disabled rather than a "simple" one that requires _me_ to bolt on said extra capabilities.
> XML is notoriously expensive to properly parse in many languages.
I'm glad this is the top comment. I have extensive experience in enterprise-y Java and XML and XML is anything but cheap. In fact, doing anything non-trivial with XML was regularly a memory and CPU bottleneck.
But of course, working with SAX parsing is yet another, very different, bag of snakes.
I still hope that json parsing had the same support for stream processing as XML (I know that there are existing solutions for that, but it's much less common than in the XML world)
A parser that only had to support a specified “profile” of XML (say, UTF-8 only, no user-defined entities or DTD support generally) could be much simpler and more efficient while still capturing 99% of the value of the language expressed by this post.
(Now ITOT they may have profiles of their own, e.g. where safe parsing, validation, and XSLT support are concerned, but they have a large overlap.)
What are more concerning are the issues that result in unbounded parses – but there are several ways to control for this.
This mindset is why we have computers now that are three+ orders of magnitude faster than a C64 but yet have worse latency.
For this application it's plenty fast. Even if you've got a Pentium machine.
Cheap here is semantically different from cheap in the article. Here it means "how hard it hits the CPU" and in the article is "how hard it is to specify and widely support your DSL".
You also posted a piece of code that the author himself acknowledged that is not bad and ommited the one pathological example where implementation details leak when translating to JavaScript.
It just seems like you didn't approach reading the article willing to understand what the author was trying to say, as if you already decided the author is wrong before reading.
Ergonomics of input are important because they increase chances of it being correct, and you can usually still keep it strict and semantic enough (eg. LaTeX is less layout-focused than Plain TeX)
Yes let's not even get started on implementations who do <something value="value"></something>
It was also about how easy it was to generate great XML.
Because it is complicated and everyone doesn't really agree on how to properly representative an idea or concept, you have to deal with varying output between producers.
I personally love well formed XML, but the std dev is huge.
Things like JSON have a much more tighter std dev.
The best XML I've seen is generated by hashdeep/md5deep. That's how XML should be.
Financial institutions are basically run on XML, but we do a tonne of work with them and my god their "XML" makes you pray and weep for a swift end.
Or, y'know, use the language you have (JavaScript) properly, eg. add a `sum` abstraction instead of `.reduce((acc, val) => { return acc+val }, 0)`.
In particular, the problem of "all the calculations are blocked for a single user input" is solved by eg. applicatives or arrows (these are fairly trivial abstract algebraic concepts, but foreign to most programmers), which have syntactic support in the abovementioned languages.
(Of course, avoid the temptation to overcomplicate it with too abstract functional programming concepts.)
If you write an XML DSL:
1. You have to solve the problem of "what parts can I parallelize and evaluate independently" anyway. Except in this case, that problem has been solved a long time ago by functional programming / abstract algebra / category-theoretic concepts.
2. It looks ugly (IMHO).
3. You are inventing an entirely new vocabulary unreadable to fellow programmers.
4. You will very likely run into Greenspun's tenth rule if the domain is non-trivial.
Since Raku suports both OO and Functional coding styles, and has built in Grammars, it is very nice for DSLs.
But the author talks about xml as an interchange format which is indeed better than prolog code...
EDIT: obviously, JSON tooling sprang up because JSON became the lingua franca. I meant that it became necessary to address the shortcomings of JSON, which XML had solved.
0: https://marcosmagueta.com/blog/the-lost-art-of-xml/
The browser supported XML as much as Javascript. Remember that the "X" in "AJAX" acronym stands for XML, as well as "XMLHttpRequest" which was originally intended to be used for fetching data on the fly in XML. It was later repurposed to grab JSON data.
Javascript was not a reason XML was abandoned. It was just that the developer community did not like XML at all (after trying to use it for a while).
As for whether the dev community was "right", it's hard to comment because the article you linked is heavy on the ranting but light on the contextual details. For example it admits that simpler formats like JSON might be appropriate where "small data transfers between cooperating services and scenarios where schema validation would be overkill". So are they talking about people storing "documents" and "files" in JSON form? I guess it happens, but is it really as common to use JSON as opposed to other formats like YAML (which is definitely not caused by Javascript in the browser winning)?
Personally I think XML was abandoned because inherent bad design (and maybe over-engineering). A simpler format with schema checking is probably more ideal IMHO.
Yes, XML is more descriptive. It's also much harder for programmers to work with. Every client or server speaking an XML-based protocol had to have their own encoder/decoder that could map XML strings into in-memory data structures (dicts, objects, arrays, etc) that made sense in that language. These were often large and non-trivial to maintain. There were magic libraries in languages like Java and C# that let you map XML to objects using a million annotations, but they only supported a subset of XML and if your XML didn't fit that shoe you'd get 95% of the way and then realize that there was no way you'd get the last 5% in, and had to rewrite the whole thing with some awful streaming XML parser like SAX.
JSON, while not perfect, maps neatly onto data structures that nearly every language has: arrays, objects and dictionaries. That it why it got popular, and no other reason. Definitely not "fashion" or something as silly as that. Hundreds of thousands of developers had simply gotten extremely tired of spending 20% of their working lives producing and then parsing XML streams. It was terrible.
And don't even get me started on the endless meetings of people trying to design their XML schemas. Should this here thing be an attribute or a child element? Will we allow mixing different child elements in a list or will we add a level of indirection so the parser can be simpler? Everybody had a different idea about what was the most elegant and none of it mattered. JSON did for API design what Prettier did for the tabs vs spaces debate.
> There is a distinction that the industry refuses to acknowledge: developer convenience and correctness are different concerns. They are not opposed, necessarily, but they are not the same thing. … The rationalization is remarkable. "JSON is simpler", they say, while maintaining thousands of lines of validation code. "JSON is more readable", they claim, while debugging subtle bugs caused by typos in key names that a schema would have caught immediately. "JSON is lightweight", they insist, while transmitting megabytes of redundant field names that binary XML would have compressed away. This is not engineering. This is fashion masquerading as technical judgment.
I feel the same way about RDBMS. Every single time I have found a data integrity issue - which is nearly daily - the fix that is chosen is yet another validation check. When I propose actually creating a proper relational schema, or leaning on guarantees an RDBMS can provide (such as making columns that shouldn’t be NULL non-NULLable, or using foreign key constraints), I’m told that it would “break the developer mental model.”
Apparently, the desired mental model is “make it as simple as possible, but then slowly add layer upon layer of complex logic to handle all of the bugs.”
The article posted here makes a good point actually. XML is a DSL. So working with XML is a bit like working with a custom designed language (just one that's got particularly good tooling). That's where XML shines, but it's also where so much pain comes from. All that effort to design the language, and then to interpret the language, it's much more work than just deserializing and validating a chunk of JSON. So XML is great when you need a cheap DSL. But otherwise it isn't.
But the article you quoted makes the case that XML was good at more stuff than "lightweight DSL", that JSON was somehow a step back. And believe me, it really wasn't. Most APIs are just that.. APIs. Data interchange. JSON is great for this, and for all its warts, it's a vast, vast improvement over XML.
Even if it's fashionable to do the wrong thing, the developer is at fault for choosing to follow fashion instead of doing the right thing.
All these XML DSLs were so dreadful to write and maintain for humans that most people despised them. I worked in a department where semantic web and all this stuff was fairly popular and I still remember remember one colleague, after another annoying XML programming session, saying fuck this, I'll rip out all the XSLT and XQuery and will just write a Python script (without the swearing, but that was certainly his sentiment). First it felt a bit like an offense for ditching the 'correct' way, but in the end everyone sympathized.
As someone who has lived through the whole XML mania: good riddance (mostly).
And don't even get me started on the endless meetings of people trying to design their XML schemas.
I have found that this attracts certain type of people who like to travel to meetings and talk about schemas and ontologies for days. I had to sit through some presentations, and I had no idea what they presented had to do anything, they were so detached from reality that they built a little world on their own. Sui generis.
I am not a dev; I’m ops that happens to know how to code. As such, I tend to write scripts more than large programs. I’ve been burned enough by bash and Python to know how to tame them (mostly, rigid insistence on linters and tests), but as one of my scripts blossomed into a 15K LOC monstrosity, I could see in real time how various decisions I made earlier became liabilities. Some of these were because I thought I wouldn’t need it, others were because I later had learned I might need flexibility, but didn’t have the fundamental knowledge to do it correctly.
For example, I initially was only using boolean return types. “It’s simpler,” I thought - either a function works, or it doesn’t, and it’s up to the caller to decide what to do with that. Soon, of course, I needed to have some kind of state and data manipulation, and I wound up with a hideous mix of side effects and callbacks.
Another: since I was doing a lot of boto3 calls in this script, some of which could kick off lengthy operations, it needed to gracefully handle timeouts, non-fatal exceptions, and mutations that AWS was doing (e.g. Blue/Green on a DB causes an endpoint name swap), while persisting state in a way that was crash-proof while also being able to resume a lengthy series of operations with dependencies, only some of which were idempotent.
I didn’t know enough of design patterns to do all of this elegantly, I just knew when what I had was broken, so I hacked around it endlessly until it worked. It did work (I even had tests), but it was confusing, ugly, and fragile.
The biggest technical learning I took away from that project was how incredibly useful true ADTs are, and how languages that have them can prevent entire classes of bugs from ever happening. I still love Python, but man, is it easy to introduce bugs.
It's one of many equivalent such parser tools, a particularly verbose one. As such it's best for stuff not written by hand, but it's ok for generated text.
It has some advantages mostly stemming from its ubiquity, so it has a big tool kit. It has a lot of (somewhat redundant) features, making it complex compared to other options, but sometimes one of those features really fits your use case.
Heh, a couple of years ago I walked past a cart of free-to-take discards at the uni, full of thousand-page tomes about exciting subjects like SOAP, J2EE and CORBA. I wonder how many of the current students even recognized any of those terms.
My experience has been the people complaining about it were simply not using automated tools to handle it. It’s be like people complaining that “binaries/assembly are too hard to handle” and never using a disassembler.
Speaking of "correctness"... It seems to me people almost never mention that while schema verification can detect a lot of issues, in the end it cannot replace actual content validation. There are often arbitrarily complicated constraints on data that requires custom code to validate.
This is analogous to the ridiculous claim that type checking compilers can tell you whether the program is correct or not.
The code that processes the graph will also be simpler as you don't have to parse the XML graph and turn it into something that can be executed.
And if you still need XML, you can generate it easily.
1. https://gitlab.com/canvasui/canvasui-engine/-/blame/main/exa...
2. https://gitlab.com/canvasui/canvasui-engine/-/blob/main/exam...
3. https://gitlab.com/sablelang/libcuidoc
At work, we have an XML DSL that bridges two services. It's actually a series of API calls with JSONPath mappings. It has if-else and goto, but no real math (you can only add 1 to a variable though) and no arrays. Debugging is such a pain, makes me wonder why we don't just write Java.
JSON: No comments, no datatypes, no good system for validation.
YAML: Arcane nonsense like sexagesimal number literals, footguns with anchors, Norway problem, non-string keys, accidental conversion to a number, CODE INJECTION!
I don't know why, but XML's verbosity seems to cause such a visceral aversion in a lot of people that they'd rather write a bunch of boring code to make sure a JSON parses to something sensible, or spend a day scratching their head about why a minor change in YAML caused everything to explode.
Actually my own problem with XML was annoyance that back when I had the thought of doing a complex config format in XML, the idea of modifying it programmatically while retaining comments turned out to be absolutely non-trivial. In comparison with the mess one can make with YAML that's just a trivial thing.
JSON just works. Every language worth giving a damn about has a half-decent parser, and the syntax is simple enough that you can write valid JSON by hand. You wouldn't hit the edgy edge cases or the need to use things like schemas until down the line, by which point you're already rolling with JSON.
XML doesn't "just work". There are like 4 decent libraries total, all extremely heavy, that have bindings in common languages, and the syntax is heavy and verbose. And by the time you could possibly get to "advanced features that make XML worth using", you've already bounced off the upfront cost of having to put up with XML.
Frontloading complexity ain't great for adoption - who would have thought.
I don't agree at all. With tools like Zod, it is much more pleasant to write schemas and validate the file than with XML. If you want comments, you can use JSON5 or YAML, that can be validated the same way.
Will check later if there's some interesting calcs here to transpose, I'm for more models being published by public bodies!
In Norway, we've had a more or less automated tax system for many years; every year you get a notification that the tax settlement is complete, you log in and check if everything is correct (and edit if desired) and click OK.
It shouldn't be more difficult than this.
…note this doesn’t really say much. Both are terrible.
What hurt XML was the ecosystem of overly complex shit that just sullied the whole space. Namespaces were a disaster, and when firms would layer many namespaces into one use it just turned it into a magnificent mess that became impossible to manually generate or verify. And then poorly thought out garbage specs like SOAP just made everyone want to toss all of it into the garbage bin, and XML became collateral damage of kickback against terrible standards.
preach. I'm convinced there are cycles in the tax code that can be exploited for either infinite taxes or zero taxes. Can Claude find them?