Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Quest for Permissively Licensed PDF Library in C# (duerrenberger.dev)

59 points by ingve 18 days ago | 62 comments

tom_alexander 5 days ago [-]

> obviously needs to be a PDF

I've been making my reports in self-contained HTML files[0] and it works out so much better than PDF. It is not constrained by paper sizes, and it lets me add some nifty features. For example, I recently added support for hiding columns in a table using exclusively CSS. The only downside is browsers can render things slightly differently, but for my use cases I don't need pixel-perfect identical rendering.

[0] Images are inlined base64-encoded, CSS/JS embedded with style and script tags. No external assets / no http requests.

giancarlostoro 5 days ago [-]

You can also use media queries for printing specific styling too so you can remove things that maybe a user doesn't need to print out:

https://developer.mozilla.org/en-US/docs/Web/CSS/Guides/Medi...

dmboyd 5 days ago [-]

Being constrained by page sizes is “a feature, not a bug” in most contexts. If I’m calling out numbers on the 3rd line of page 38 of a report, it helps if that’s consistent.

kgwxd 5 days ago [-]

The only reason PDFs still have a job is: pixel perfect consistency; the built-in validity stuff (ensuring the document wasn't altered, etc.); or the customer doesn't need the other things, but isn't open to alternatives. Otherwise, PDF is just a major headache.

wongarsu 5 days ago [-]

Also page-level consistency, and generally layouting in a printable format

Even with the same word document opened only in various MS Word versions (web, desktop, etc) you won't get consistent page numbers. And HTML tables work great on screen but don't print very well if they span more than what fits on a single sheet of paper

dwroberts 5 days ago [-]

Unless you can embed fonts [into the page itself] you aren’t beating PDF

giancarlostoro 5 days ago [-]

Not only can you embed the fonts, but you can make it interactive and output a PDF if you really wanted to. The HTML might grow if you embed enough JS, but on the other hand... some PDFs are insanely large.

fuzzy2 5 days ago [-]

Not a problem with data: URIs. But then, a report may not need fancy fonts if HTML is acceptable.

gnomewascool 5 days ago [-]

You can embed fonts into an HTML page. For example, place an @font-face with the src:url being a base64-encoded blob, in a style element.

mythz 5 days ago [-]

The wider .NET ecosystem is lacking when trying to step out the mainline. I don't bother hunting for unused, partially implemented .NET libraries anymore and just call out to a process or API call when needing to get something done.

It's not ideal, but when there isn't a good option isn't available in .NET it's usually available in Python/npm. Typically I'll use background jobs when calling out of process for added resiliency/replayability and observability.

cm2187 5 days ago [-]

Not sure I agree. Also depends of the domain. The python ecosystem is of course a lot richer for anything AI. But try to open, manipulate and export spreadsheets. In python you pretty much need a different library for every excel file format (xls, xlsx, etc) and usually the more file formats a library can handle, the least capable it is (eg pandas). In .net you have libraries like spreadsheetgear that are super powerful, including their own excel calculation engine. I see nothing remotely close in python.

exyi 5 days ago [-]

The point is that a good library usually exists for some language, which is not necessarily the one you are currently using.

IMHO, we don't lack good libraries in XY, we are lacking good interop. Going through REST or stdio is quite painful just to render PDF (or export spreadsheet, ...)

cm2187 4 days ago [-]

cough cough... COM

pjmlp 5 days ago [-]

There is hardly anything that isn't available in .NET, the main problem is being willing to pay for tooling.

mythz 5 days ago [-]

I'm using of a lot of ComfyUI Workflows, Custom Nodes, Image and Audio classifiers relying on PyTorch, supervision, ultralytics, MediaPipe, OpenCV, onnxruntime, pandas, numpy that says otherwise. There are some equivalents, but the ecosystems aren't playing in the same ballpark.

pjmlp 4 days ago [-]

Last time I checked, There is hardly anything, doesn't mean there aren't any libraries left covering.

And even so, there are .NET bindings for a few of those libraries like PyTorch, OpenCV and ONNX Runtime.

mythz 4 days ago [-]

You're misrepresenting reality, companies aren't choosing .NET for their AI workflows. There's nothing like ComfyUI, the ecosystems are worlds apart, it's not even close.

pjmlp 4 days ago [-]

Your distorting my words, I haven't said anything specific about AI or world domination via NET.

Only that there are plenty of use cases coverage, moreso when willing to actually pay for tooling.

I never mentioned that .NET was on the world domination path for AI libraries.

If folks rather use an interpreted language, CPU vendors will appreciate it.

mythz 4 days ago [-]

> There is hardly anything that isn't available in .NET, the main problem is being willing to pay for tooling.

Here you're saying .NET has nearly everything, you just need to pay for it sometimes.

> And even so, there are .NET bindings for a few of those libraries like PyTorch, OpenCV and ONNX Runtime.

As apparently AI is no problem for .NET either since it has some bindings. So I really didn't need to use Python if I was prepared to pay for some tools as "There is hardly anything that isn't available in .NET" - misrepresent the situation much?

As if that does anything to help the different Python packages you need. Yeah you could rewrite every Python package built on top of it, or you know, shell out to a process or API.

pjmlp 4 days ago [-]

Nearly everything, means not everything, that something is not covered, AI tools for example.

Existing PyTorch, OpenCV and ONNX Runtime bindings doesn't meant there is a solution for every Python package out there.

However, I will state, since you're the one driving the discussion down this path, that there are many AI scenarios that are nicely taken care by .NET and Windows tooling like WindowsML, which work good enough for many Microsoft shops scenarios.

Not everything needs to be Python.

mythz 4 days ago [-]

> Nearly everything, means not everything, that something is not covered, AI tools for example.

Yeah I know some things that aren't covered, like everything I've listed in my first comment that I'm currently shelling out to Python for.

> Existing PyTorch, OpenCV and ONNX Runtime bindings doesn't meant there is a solution for every Python package out there.

So why are you trying to use some scattered bindings to misrepresent .NET's AI capabilities? Your intent has been to say there's no need to use anything else since .NET basically has it all - to someone who needs to shell out to Python, because .NET didn't have what I needed.

> many AI scenarios that are nicely taken care by .NET and Windows tooling like WindowsML, which work good enough for many Microsoft shops scenarios.

So intead of shelling out to Python, I could've just.. replaced my inexpensive Linux servers and deploy to Window Servers and Azure??? Thanks for the advice, but I'll stick to the easiest solution that actually works.

> Not everything needs to be Python.

No, just everything I need to shell out to Python for, i.e. my entire point.

neonsunset 4 days ago [-]

[dead]

eXpl0it3r 4 days ago [-]

Is this for your personal workflow, or for applications that you ship?

How do you handle deployment / packaging of multiple, different ecosystems?

thiago_fm 5 days ago [-]

This looks like ChatGPT. There are PLENTY of alternatives on the post.

Python and others have similar issues, with them having limitations as well

mythz 5 days ago [-]

It wouldn't be a quest if there were lots of good options, a few good options is better than lots of unused/unmaintained ones.

smithkl42 5 days ago [-]

We've been using Aspose.PDF for the last 10 years or so in our C# platform, and paying for the license. It's expensive and buggy and has shite support, so a year or so back I decided to see if there was some other library or combination of libraries that could meet our needs. Basically, we needed:

* HTML to PDF

* Compress PDF

* Manual PDF generation

* Text extraction

* No browser engine or other weird dependencies

I researched every library I could find, and downloaded, integrated and tested anything that looked remotely promising.

At the end of all that, I reluctantly handed my company credit card back to Aspose. There simply wasn't any open-source or even just cheaper PDF library that I could actually make work, and all the other paid ones that did work were even more expensive.

c0wb0yc0d3r 5 days ago [-]

Aspose is the library I’ve used commercially in the past, too. My experience was similar. The company I worked for at the time eventually charged more for PDF export as a paid add on. The software is very sticky so the people who truly needed pdf export directly paid, the rest relied on export to word then “printed” the pdf themselves.

dfcab 5 days ago [-]

I am in the same boat. Aspose has been the go to for Word and PDF documents. Will say, Adobe's PDF Services API offers a ton of interesting features but comes with a price tag and in my scenario, it's not HIPAA compliant.

kappadi3 5 days ago [-]

[flagged]

GiorgioG 5 days ago [-]

Stop spamming your own service.

Archelaos 5 days ago [-]

I create PDF files from C# using LaTeX as an intermediate format. This works very reliable but sometimes takes a bit of tinkering until everything fits.

People here on HN recently recommended Typst as a replacement for LaTeX, but I haven't tried it myself yet.

eXpl0it3r 4 days ago [-]

Just today I looked at LaTeX interop for C#, but it seems the TeX world is in its own bubble of commandline tools.

Do you use any library or are you just calling the standard TeX CLI tools?

actionfromafar 18 days ago [-]

I my eyes, PdfSharpCore¹ is now the "canonical" version of pdfcore.

IMHO the list is incomplete without it.

1: https://github.com/ststeiger/PdfSharpCore

eXpl0it3r 18 days ago [-]

It seems the PDFSharp rabbit hole goes even deeper than I've realized!

Latest MigraDoc & PDFSharp seem to have been updated and ported to .NET 6 after a lot of the forks happened, so it was unclear to me whether there's merits in looking at other, mostly abandoned forks.

I might add PdfSharpCore, though the use of SixLabors.ImageSharp and SixLabors.Fonts leads to a disqualification from the "quest", given their custom split license [1]

Edit: Actually, the license seems to turn into an Apache 2.0 license, when used with an open source licensed project and also as transitive dependency. Certainly a confusing license.

[1] https://github.com/SixLabors/ImageSharp/blob/main/LICENSE

actionfromafar 18 days ago [-]

Edit: PSA - PdfSharpCore uses older releases of SixLabors.ImageSharp v1.0.4 and Fonts-1.0.0-beta17 which both were (and are still) distributed under plain Apache-2.0.

https://web.archive.org/web/20251104163604/https://codeload....

eXpl0it3r 17 days ago [-]

Good to know, thank you!

Though, makes me wonder how much "old code" this is then collecting...

actionfromafar 4 days ago [-]

If it works, it works. And specifically, Works For Me ™ :)

tonyedgecombe 5 days ago [-]

>Naturally, I first started looking for permissively licensed libraries, which could be used free of charge and without additional license requirements.

There is a lot of work in a good PDF library, expecting to get it for free feels unreasonable to me.

unethical_ban 5 days ago [-]

Given reality, it would be silly for a consumer not to look for the cheapest option available, that doesn't have vendor lock-in.

That's said, for many niche products you are correct.

kappadi3 5 days ago [-]

If you ever revisit alternatives, you might want to try YakPDF It gives you:

- HTML → PDF without any browser engine - PDF compression & optimization - Simple API for manual PDF generation - Text extraction - No native dependencies and cheaper than Aspose

It’s not a full drop-in replacement for every Aspose feature, but it covers the core workflow you mentioned and is much lighter to integrate.

https://rapidapi.com/yakpdf-yakpdf/api/yakpdf (open via firefox)

GiorgioG 5 days ago [-]

YakPDF (as far as I can tell) is an API and not a library that generates a PDF. If you're going to go that route, host https://github.com/gotenberg/gotenberg yourself and call it a day.

edit: Stop spamming your own service.

flanbiscuit 5 days ago [-]

I needed this post a year ago when I was looking for this exact thing. I did end up going with Puppeteer because I needed it for something else that I couldn't avoid. I use a large list of flags with it to launch the most minimal version of headless Chrome that I can.

I am going to look into switching to MigraDoc and see if i can drop puppeteer

Thanks for this great research!

eXpl0it3r 4 days ago [-]

You're welcome!

Having played around with MigraDoc for the past few weeks, I do still recommend it, as long as you don't need more complex layouts. Here's a short and certainly incomplete list of limitations that I've run into so far:

- No tables within other tables

- No multi-column page layouts

- No multi-section on the same page (new section = new page)

- No letter spacing

- MigraDoc doesn't know about the final spacing, so you can't adjust say the width of some table column automatically. Either calculate an estimated based on the text/content or space them equally.

- Can't shade (background color) only a selection of words in a text

- Lists can only have up to three different symbols

- List indentation can behave quite strange, due to tabstops

- No horizontal rule (can be emulated)

- There's a bug with bottom border of a paragraph

On the other hand, MigraDoc & PDFsharp as less than 1MB and plenty fast, so it's a great package, as long as you can build some workarounds to achieve the desired look.

mimi_007 4 days ago [-]

You can skip low-level PDF libs entirely and just generate the PDF from an HTML/CSS template (or plain HTML or URLS). With PDFBolt you design the layout once in HTML/CSS + Handlebars, then your C# code sends JSON and gets a PDF back from the API. That avoids the whole DOM/layout/rendering headache and you don’t have to ship a browser engine yourself.

It’s not a replacement for permissively licensed libraries when you need everything local, but for reports/invoices/etc. it can save a lot of time.

gpvos 5 days ago [-]

When I used PdfSharp about 9 years ago, it wasn't really designed to import arbitrary PDFs; it crashed or hung on many less common constructs or invalid PDF files. It was really only designed to either create PDFs or edit PDFs created by itself (or MigraDoc, which used it); that it could also import some other PDFs was considered a bonus by its maintainers. I submitted some patches back then to fix the most egregious problems. Hopefully it has improved.

We needed a library to read arbitrary PDF files (although I forgot what exactly we needed to read from them; it wasn't for full rendering) and ended up using PdfSharp, because iText did not respond to our pricing request.

eXpl0it3r 4 days ago [-]

Most of my research was directed at PDF creation and less at editing, so I'm not sure the list captures everything available.

For example I somehow dismissed PdfPig [1] early on, because it's mostly for text extraction from PDFs, but it does support some basic editing of sorts.

[1] https://github.com/UglyToad/PdfPig

bob1029 5 days ago [-]

My favorite approach for PDF rasterization was to interop with a simple, custom Java console application that leveraged Apache PdfBox.

This lasted until the log4j exploit, at which point we had to abandon it altogether due to our customers (banks) having a complete meltdown over it at the time.

It's probably still a really good option. I would definitely go back to it in a different context.

eXpl0it3r 4 days ago [-]

PdfPig [1] is a (partial?) port of PdfBox, I haven't really tried it so far, due to the weak support for PDF creation.

[1] https://github.com/UglyToad/PdfPig

fuzzy2 5 days ago [-]

Oh yeah, PDF. In a past project I created a monster solution:

  * Scriban to fill in templates (LaTeX)
  * Custom Angular SSR to reuse frontend components (charts etc)
  * Playwright to convert SSR output to PDF
  * LuaLaTeX to convert LaTeX document + stuff to PDF

Super slow, but very high quality results. Do not try this at home!

Scriban is totally awesome though.

ThomasMidgley 5 days ago [-]

Very good post, thank you!

Partially OT:

Can anyone recommend a "print PDF to Laserprinter" library?

I have been looking for a library for C# for some time now that would allow me to print PDF files on a laser printer programmatically. However, I cannot find one. Until now, I have been using Foxit Reader, which I call up via the command line. But this is not ideal for various reasons.

eXpl0it3r 4 days ago [-]

Do you know what Foxit Reader does for the "printing" part? Does it convert it to XPS or PS?

klysm 5 days ago [-]

If you are looking for a solution to generate PDF reports, I highly recommend using typst

marsven_422 5 days ago [-]

[dead]

wolvesechoes 4 days ago [-]

> Naturally, I first started looking for permissively licensed libraries, which could be used free of charge and without additional license requirements.

Tick looking for the host.

hermitcrab 5 days ago [-]

I've been looking for a while for a C++ library that can recognize and extract data tables reliably from PDF documents. Open source or commercial. Anyone know of one?

sander1095 5 days ago [-]

Thanks for this post! I've wanted to create such a post for a long while but never got around to it. Yours is fantastic!

eXpl0it3r 4 days ago [-]

Thank you, glad you liked it!

If you have any libraries on your list that is missing on mine, let me know!

thiago_fm 5 days ago [-]

The wrappers to wkhtmltopdf look to me the best candidates.

Which use-cases needing Qt WebKit is an issue?

eXpl0it3r 4 days ago [-]

wkhtmltopdf repository has been archived on GitHub. Maybe if you're lucky, someone will make a fork, but it's quite hard for a fork to get traction and retain some user base / not to be immediately discarded as well.

Qt WebKit is, compared to Chromium, quite light-weight at 30-40MB. However it's still quite large, compared to say MigraDoc + PDFsharp at 1MB. Truthfully, if there was a well maintained project using it, I'd have actually considered it.

Though Qt WebKit also moves you into the murky water of copy-left licensing, as Qt is released under LGPL. With the setup of using a shared native library, this should work, but might still not be something everyone wants to touch.

pabs3 5 days ago [-]

wkhtmltopdf is unmaintained and deprecated though.

giancarlostoro 5 days ago [-]

At work we were using I think it was GDPicture? Which is now called Nutrient. They started out with a flat fee, royalty free, then their pricing scheme became more hostile over time (per developer, per application licensing, and I don't recall if they wanted to know how many users - which is crazy unless it's a SaaS). I have friends (former coworkers) and family who ask me for advice on software libraries to use for what, since they know I'm a hyper nerd for that sort of thing, last time a former coworker asked what PDF library to use I told them to avoid Nutrient like the plague. There's wanting to be sustainable and then there's greed.

So yeah I too was looking for permissive licensing. The worst part is now its drastically harder for me to suggest any paid alternatives because we don't know that the alternative wont hike up prices on us. It's a really awful spot to be in.

davsti4 5 days ago [-]

PDFlib - I've used it since 2001. Their pricing is stable, and they've been flexible over the years as computing models have shifted.

yread 5 days ago [-]

I went through a similar quest recently and I was a bit disappointed there isn't an easy way to convert markdown to pdf (without going through html and puppeteer). It sounds easy, right? I still get flashbacks of having to write a program on paper that outputs justified text, so I won't even attempt it...

eXpl0it3r 4 days ago [-]

The past week, I've implemented a Markdig renderer that outputs / renders into a Migradoc document, which can be converted to PDF. For the limited used case it works quite well and I'm currently looking into open sourcing it with some other MigraDoc related stuff under MIT.

Justified text isn't exactly a standard Markdown feature though. I guess Markdown converters are in general not easy, since there are so many dialects and at best people just mix in pure HTML, requiring a HTML renderer as well.

Anyways, if you're interested in the Markdig to MigraDoc renderer, let me know.

yread 4 days ago [-]

Sure, that would be more than enough for what I need! Thanks!

Rendered at 04:29:54 GMT+0000 (Coordinated Universal Time) with Vercel.