New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDF2: build failure #2415
Comments
I haven't tried this yet - but I know the DITA 1.2 specification document has some pretty overwhelming key use, that probably was not very well designed. The 1.3 specification uses much better design patterns in that area. I'm marking this |
I've managed to build the DITA 1.3 Base spec, but the DITA 1.3 Technical Content spec. also fails with JVM arg set to: -Xmx12288m That is 12GB of memory, and still it doesn't build a PDF that will end up about 15MB in size! Maybe it's because XEP is also Java based, and using the same JVM? Perhaps AHF doesn't produce the same issue? |
Looking at the build files, it seems that the XEP process is forked:
so the forked process has its own maxmemory set to it:
so you can try to add one more parameter to the transformation called "maxmemory" with a value like "1024m" because the Xmx you originally set does not seem to be used in this case. |
Thanks. I tried maxmemory with various values up to 12288m, and still the build failed. Apache FOP managed it! I went straight to a JVM arg set to: -Xmx12288m. FOP even managed the DITA 1.2 spec, although it did find 1056 errors all similar to the original one: [DOTJ047I][INFO] Unable to find key definition for key reference "topic-contains" in root scope. The href attribute may be used as fallback if it exists This was the reason for reporting this in the first place. DOT 1.8.5 didn't report these errors. |
Could you also try FOP with less memory like -Xmx1024m? I think it should also work with it. |
I'm not sure how this is an issue. It's always been the case that you may have to set the memory higher to handle certain documents. The fact that the memory requirement may be higher in 2.x doesn't really change that. |
As an anecdote that helps nothing: on the laptop I was using in 2010 (which was admittedly near end of life at the time), building the 1.2 spec in some situations actually caused my machine to shut down due to overheating. In other words - that document is not a "normal" DITA sample. The 1.3 spec is better designed, but the amount of linking is still extraordinary and should be expected to use a very large amount of memory. |
The original issue here reported two issues. The first is the memory issue, which I think has been addressed: the specification is a very large document, requiring an extraordinary amount of memory-intensive processing. As with any other similar document, additional memory must be allocated to allow the document to build. Worth noting: from the latest release going all the way back to 1.5.2 (the oldest version we still host), our documentation has for years described how to increase memory for such documents. The second issue is about the undefined key message that appears in 1.8.5, but not in 2.x:
This key exists with many similar keys in the DITA 1.2 specification source (not in 1.3), and is in fact not defined. However, it is used in a peculiar construct: <section
conref="../common/commonNavLibraryTable.dita#contentmodel-topic/contains"
id="contains" otherprops="contains">
<title>Contains</title>
<p>The content model of this element may differ based on where it is used.
Content model information is located here: <xref keyref="topic-contains"></xref></p>
</section> The key is only ever used inside of a section that specifies |
No! It appears in 2.2.3, not in 1.8.5. |
Right - I was just editing my comment after I realized that I'd gotten that backwards, but will let the comment stand at this point. I think the message here is useless - the key can never be evaluated, so we shouldn't care if it is defined. That said - it's not a good design in the source. In fact, it's a really bizarre design in the source when looked at without knowing the original intent. I don't think there is any reason to try to make the code smart enough to detect this condition, and avoid warning about it. Basically - we have a key in the content, and that key is never defined. The fact that it's used in a location that won't result in anything useful is not really the point - the message is correct that we have a key in the source, and that key is never defined. |
This has to be a change to the order of preprocessing, with key resolution now happening before content reference resolution. Ideally the preprocessing would resolve all direct URI conrefs, then do key space construction, then do the rest of the processing, including resolving key-based conrefs. Direct URI conrefs can't be affected by key resolution but definitely affect key space construction, as we see here. |
I appreciate that large documents require more memory, and that a vast number of links exacerbate it. However, the 2.x architecture, I believe, is far more memory intensive. XEP with 1.8.5 is able to build the DITA 1.2 spec., but with 2.2.3 it cannot. Also, the free Apache FOP builds documents that the paid for XEP cannot. The DOT controls the environment that XEP works in. Is this really an XEP issue? I cranked the memory right up to 12GB! |
Apologies for jumping in too quickly. |
@drmacro -
Just wanted to note that this case (resolving conref in the topic) doesn't actually affect the key space. The key space is the same before and after this conref is evaluated - in either case, the key |
Yes, I guess I mean key space construction and key reference validation/resolution. |
In general, I would not be surprised at this. The 2.x releases add a lot of processing to support DITA 1.3; the fact that it is based on a new version of the standard, with significant new processing requirements (branch filtering, key spaces), is likely to result in more memory requirements.
I'm assuming you've got the same version of XEP running in each case - please correct me if I'm wrong. If that's the case, then I can only think something in the
The FO file itself should be identical, possibly excepting the few features/extensions that are supported by XEP and fail in FOP. If the same FO markup works in FOP and runs out of memory in XEP, then the fact it's running within DITA-OT is not the issue. |
For what it's worth - when I generate
This is with DITA-OT 2.3, with a few of my own plugins in place - so there could be a bit extra in the FO, but those extra bits would be identical in the FO and XEP versions. I'm not sure what is causing that 30K extra in XEP - it's not much relative to the overall size, but I also don't know if that extra piece could somehow cause memory issues. I do see just a few rx: link attributes and rx: table header attributes. |
Even with this change, with both sets of core code at the same level of XSL, it's tough to do the diff - every generated ID differs by a digit or two. But it looks like most of the extra size probably comes from extra markup around index entries in the XEP version. I don't have a copy of XEP handy to actually build that version of the FO, but I wonder if the FO generated for FOP will run cleanly in XEP. If so, then it would seem that the index markup is the problem. If not -- that is, if the exact same FO file builds in FOP and fails in XEP -- then this would clearly be an issue with XEP's memory management for this document, keeping in mind that this is a very unusual document. |
I took the FO generated for FOP, handed it to a co-worker to test with XEP, and he verified that building that FO file in XEP (outside of the DITA-OT process) also resulted in a Java Heap Failure after page 724. In the FOP output, which wouldn't match exactly but should be similar, page 724 (of 1256) is in the middle of the bookmap section of the language specification. |
If someone could ZIP that FO file and post to me, we would be happy to examine what the issue is. you could zip and email to kevin at renderx.com |
Here are the various files, attached. DITA1.2-DOT2.3.0-XEP4.2.5-topic.fo.zip - fail |
I had no issues processing all three of those file with RenderX to PDF … except that none of the images were provided in the archive. Can someone recreate those with the associated images so I can test (or if they share images, just zip up one set of images for me?). Kevin Brown RenderX |
I'm attaching a zip of the |
closer, a few more images needed: langref/images/imagemapworld.jpg of type null |
Wed 06/15/2016_13:24:16.92 1210 pages, about 36.5 secs, 1024MB memory using Java 1.8. PDF attached. result.zip contains the full log. Memory consumption could be reduced and speed increased if validation was turned off. |
I would note a few warnings about value of start-indent, I would need some time to study where these are coming from. |
Thanks for the follow-up @kbrown01. If it's a Java issue, there isn't much we can do in the toolkit. Turning off validation could be done but I doubt we want that as the default - if there are problems in the FO, it's better to have XEP report them. |
Its not a Java issue except statements like "I set Java -Xmx to {some huge value}" ... |
I'm using a Mac. The JVM is built into Oxygen 18. From their webpage:
|
If you are formatting through oxygen (meaning you have the FO open in Oxygen and you execute the format through XEP), then what does you xep script have inside it? how is it referencing Java and memory? |
I also formatted using Java 1.6 32bit with -Xmx1024 without issue. There is no issue to format this document in 1024MB memory or less on both 32bit and 64bit Java. |
Not knowing what you all are writing about, it would be nice if someone reported that the PDF I posted is correct and no issues were found. It would also be nice to understand why (apparently) so many people are having issues. All I did was download that FO, change the links in it to my own local disk for the images (as they had paths from the creator), and format it without any issue whatsoever. I would guess folks have issues in using RenderX XEP and not understanding the scripts to call it (or how to call Java or even what Java they are using or something?) and the fact is that there is no problem with RenderX XEP. |
Apologies for the delay. I have looked through the PDF, and it looks like a bog standard DOT PDF. I can't see any particular issues. I've attached the DOT log file: |
This message error you are getting: java.lang.OutOfMemoryError: GC overhead limit exceeded Is not RenderX out of memory, it comes from the fact that you are specifying more memory than Java can allocate. In no way can Java allocate -Xmx12288m in heap, that is 12GB The next question is what version of Java is this? /Applications/oxygen/.install4j/jre.bundle/Contents/Home/jre/bin/java Is that 64bit Java or 32bit Java? From: David Hollis [mailto:notifications@github.com] Apologies for the delay. I have looked through the PDF, and it looks like a bog standard DOT PDF. I can't see any particular issues. I've attached the DOT log file: — |
I didn't go straight to 12GB, I built it up incrementally. I don't recall where I started at, nor the increments. I probably started around 1024m, then increased it. As its a Mac, I'd assume its 64 bit. I'd be really surprised if its 32 bit. However, the JVM is built into Oxygen. Radu, @raducoravu , would need to confirm. |
I think it is and it is 64bit on MacOS. But that said, I can run that FO to PDF in 1024MB on my PC so again I do not understand the issue. If you take the FO you sent me and just run separately RenderX without all the toolkit stuff, does it run? Look in the XEP installation directory and for a MAC you should have xep (or xep.sh). You can edit that to add –Xmx1024MB on the start command and you should be able to run: xep –fo /path/to/your/fo/file.fo –pdf And see what happens? From: David Hollis [mailto:notifications@github.com] I didn't go straight to 12GB, I built it up incrementally. I don't recall where I started at, nor the increments. I probably started around 1024m, then increased it. As its a Mac, I'd assume its 64 bit. I'd be really surprised if its 32 bit. However, the JVM is built into Oxygen. Radu, @raducoravu https://github.com/raducoravu , would need to confirm. — |
Just had some time to test this on my side, downloaded the DITA 1.3 specs ZIP on Mac OSX, opened main DITA Map in Oxygen 18.0 and published to PDF using DITA OT 2.3 and XEP. As I was telling @DJBHollis before, the DITA OT build file called:
runs XEP like this:
and that ${maxJavaMemory} param value by default is 500M. |
I am very, very, very pleased and relieved to confirm that I can now build both the DITA 1.2 spec. and the DITA 1.3 All-Inc. spec. with DOT and XEP. Hallelujah! I had followed Radu's previous advice, but misunderstood. I set The values I used were:
This error is when there's not enough memory, not too much. The XEP memory was at the DOT default of 500m, and not enough. My Mac has 16 GB memory, so I felt that I could take it to 12 GB.
I'd be cautious about that. I watched the memory on the Mac OS Activity Monitor. With FOP, something called 'Launcher' appeared, and used 4GB, if I recall correctly. Watching the forked XEP, I saw two Java instances. The DOT Java maxed out at about 1.40 GB, and the XEP maxed out at about 1019 MB. They were both running at the same time, so significant overlap. If you switch to the one JVM for both, the setting would need to be 2560MB. I think I'd leave it with the two, but agree it needs to be documented. I don't know whether Given that I'm a lazy so-and-so, and use Oxygen to run DOT builds, I'd also suggest that Oxygen highlight the XEP memory requirement on the FO processor tab, or the Advanced tab. Eliot, @drmacro
But things move on. Microsoft has stopped support for XP, and seem to have moved on from the 32 bit vs. 64 bit memory issues. The Java 8 download page talks in terms of 32 bit and 64 bit browsers, and this seems to be the main justification for maintaining the 32 bit Java version. OK, there are bound to be old PCs out there. I don't know what base PC specs are like, or what typical users have. The point is, I don't think to myself, "I'm opening up Oxygen, it's a Java application, how much memory does it need?" It just happens. I think it should be more like that for the DOT. Watching the Activity Monitor, Java doesn't open up the allotted memory all at once, it increases during the process. So, if the settings I used were the default, say, it wouldn't mean that folks had to have PCs with at least 2.5 GB of free memory for DOT with XEP. Smaller docs would only use smaller amounts of memory. BTW, this is still less than the 1.5 GB JVM and 4 GB Launcher that FOP seems to use. Many thanks to everyone for support, advice and interest. I appreciate it! |
Could you please add |
Just for the record, every so often someone asks about this on the XEP email list. Could folks please consider increasing the value of |
By default with Java 8 and newer the Xmx size is 1/4 of the internal memory so maybe we could remove the "Xmx" parameter completely: |
My preference would be to run XEP in the same JVM as DITA-OT itself, like we do with FOP. I tried to change this, but I was not able to find XEP API docs that would have allowed me to do this. IIRC it was about static/global configuration that we didn't want to use. |
Because we do not distribute a particular XEP version with Oxygen, the publishing needs to work with almost any XEP version the end user is using. Which would have meant that if you wanted to use XEP in the same JVM process you would have needed to isolate the classloaders, force XEP to use only the libraries (maybe older Xerces libraries for example) which are shipped with it in order for XEP not to use newer libraries shipped with DITA OT and break. So in such cases when trying to use a tool for which you do not control the used version I'm usually a fan of using a separate process for it. |
@raducoravu I agree. Both ways to run XEP should be supported. |
I was experimenting with using the DITA 1.2 spec. as sample content.
I'm using oXy 18 and XEP. The DITA 1.2 spec. builds with the built in DOT 1.8.5, but fails with the built in DOT 2.2.3 It throws 1057 errors all similar to:
[DOTJ047I][INFO] Unable to find key definition for key reference "topic-contains" in root scope. The href attribute may be used as fallback if it exists
It also throws an "Out of memory" error. The JVM arg is set to: -Xmx12288m
DOT 1.8.5 used a JVM arg of: -Xmx1028m.
I'm using a MacBook Pro with 16GB memory, and watched the Activity Monitor. Java took over 5GB. I appreciate this is only a crude guideline.
The text was updated successfully, but these errors were encountered: