Wkhtmltopdf breaking internal links in generated PDF

archais · July 2, 2024, 8:00am

Hi all. Per my title, links to sections of a printed document do not work correctly when the document is PDFed.

As an example, when using the below:

<a href="#section-1">Section 1</a>

We navigate to the fragment ID as Section 1 from the “Full Page” view, but when the same document is PDFed (either when printing to PDF, or pressing the “PDF” button to generate with wkhtmltopdf) the fragment link is replaced with an absolute link to the site. E.g. the previous example becomes:

<a href="http://site.example.org/#section-1">Section 1</a>

Does anybody know how to resolve this?

Peer · July 3, 2024, 7:55am

Instead of:

shouldn’t you rather write:

<a name="section-1">Section 1</a>

or indeed use the “id” attribute for that purpose, see:

archais · July 3, 2024, 8:23am

Hi Peer, thanks for the reply, but I don’t follow. The content that I’m linking to does already have an ID. The link:

<a href="#section-1">Section 1</a>

Is supposed to navigate to a section of the document, e.g:

<section>
    <h2 id="section-1">Section 1</h2>
    {{ section_1_body }}
</section>

Peer · July 3, 2024, 8:29am

You named the attribute “href” and added a “#”.
I said: the attribute must be called “name” and no “#” added.
The link sais: don’t name it “name” any more, but “id”.

So you would refer the items something like this:

<a name="section-1" /><h1>This is the start of Section 1</h1>

<a href="http://site.example.org/#section-1">Jump to Section 1</a>

or

<h1 id="section-1">This is the start of Section 1</h1>

<a href="http://site.example.org/#section-1">Jump to Section 1</a>

archais · July 3, 2024, 10:08am

The relevant elements in my print format defined as you have shown in the second example with the header h2 element having an id attribute, and the linking a element having an href attribute that points to the id of the element. The only differences between your example and my implementation are:

In my implementation the link appears before the header because the link is on the first page linking to the header in a later page.
The link only has the fragment, not the absolute URL, since the section is part of the same document that is being printed via Frappe print formats.

Peer · July 3, 2024, 10:45am

Ok, so if these possible errors are not present, then it seems it’s indeed the wkhtmltopdf and/or associated mechanisms which break these links.
That would have been an easier fix.

So this needs some more knowledge or digging into the code, in order to find out if and how this can be remedied.

archais · July 3, 2024, 11:00am

Indeed. Thank you for trying to help! In between other work, I’m poking and prodding at the Frappe pdf.py which uses the pdfkit wrapper for wkhtmltopdf.

Peer · July 3, 2024, 11:20am

Decades ago I did some stuff with the reportlab python library:

It still seems to be around. I found it a bit tedious at the time, but once the project was done, it just worked and worked and worked.

I don’t know if it learned in the meantime to transform some random html thrown at it into pdf output (which in frappe looks a bit like a quick-and-easy workaround type of printing nozzle).
If not, it means something like recreating the output of documents, tables etc. in a different kind of page writing language.

Maybe such a library will find its way into frappe one day, since there are already several “designer” type frappe projects intending to create “pixel perfect” output, and these already need to manipulate similar objects. Starting from there, it might be a small step to switch to a reportlab type pdf outputter.
Said differently, just one rather smallish abstraction step away from what there already is.

archais · July 11, 2024, 2:17pm

Thank you Peer. I ended up not putting the hrefs since it was a nice to have, and not a pivotal requirement. I will definitely give Reportlab a look, thanks for the link.