Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF2: build failure #2415

Closed
DJBHollis opened this issue Jun 14, 2016 · 46 comments
Closed

PDF2: build failure #2415

DJBHollis opened this issue Jun 14, 2016 · 46 comments
Labels
priority/low Low priority issue support Inquiries on how DITA/OT is intended to work

Comments

@DJBHollis
Copy link

DJBHollis commented Jun 14, 2016

I was experimenting with using the DITA 1.2 spec. as sample content.

I'm using oXy 18 and XEP. The DITA 1.2 spec. builds with the built in DOT 1.8.5, but fails with the built in DOT 2.2.3 It throws 1057 errors all similar to:

[DOTJ047I][INFO] Unable to find key definition for key reference "topic-contains" in root scope. The href attribute may be used as fallback if it exists

It also throws an "Out of memory" error. The JVM arg is set to: -Xmx12288m

DOT 1.8.5 used a JVM arg of: -Xmx1028m.

I'm using a MacBook Pro with 16GB memory, and watched the Activity Monitor. Java took over 5GB. I appreciate this is only a crude guideline.

@robander robander added the priority/low Low priority issue label Jun 14, 2016
@robander
Copy link
Member

I haven't tried this yet - but I know the DITA 1.2 specification document has some pretty overwhelming key use, that probably was not very well designed. The 1.3 specification uses much better design patterns in that area. I'm marking this P3 for now because at this point I don't think the older 1.2 spec is a very good example of real / well-designed DITA content. [I also know that when producing the actual published spec, we had a lot of extensions in place to optimize processing of the keys, skipping some processing that we knew to be unnecessary.]

@jelovirt jelovirt added the support Inquiries on how DITA/OT is intended to work label Jun 15, 2016
@DJBHollis
Copy link
Author

I've managed to build the DITA 1.3 Base spec, but the DITA 1.3 Technical Content spec. also fails with JVM arg set to: -Xmx12288m

That is 12GB of memory, and still it doesn't build a PDF that will end up about 15MB in size!

Maybe it's because XEP is also Java based, and using the same JVM? Perhaps AHF doesn't produce the same issue?

@raducoravu
Copy link
Member

Looking at the build files, it seems that the XEP process is forked:

  DITA-OT2.x/plugins/org.dita.pdf2.xep/build_xep.xml

so the forked process has its own maxmemory set to it:

     <java classname="com.idiominc.ws.opentopic.fo.xep.Runner" resultproperty="errCode"
      failonerror="${xep.failOnError}" fork="true" maxmemory="${maxJavaMemory}" taskname="xep">

so you can try to add one more parameter to the transformation called "maxmemory" with a value like "1024m" because the Xmx you originally set does not seem to be used in this case.

@DJBHollis
Copy link
Author

DJBHollis commented Jun 15, 2016

Thanks. I tried maxmemory with various values up to 12288m, and still the build failed.

Apache FOP managed it! I went straight to a JVM arg set to: -Xmx12288m.

FOP even managed the DITA 1.2 spec, although it did find 1056 errors all similar to the original one:

[DOTJ047I][INFO] Unable to find key definition for key reference "topic-contains" in root scope. The href attribute may be used as fallback if it exists

This was the reason for reporting this in the first place. DOT 1.8.5 didn't report these errors.

@raducoravu
Copy link
Member

Could you also try FOP with less memory like -Xmx1024m? I think it should also work with it.

@DJBHollis
Copy link
Author

DJBHollis commented Jun 15, 2016

I tried FOP with -Xmx1024m but that failed. I then went to -Xmx1536m, and that worked with the DITA 1.2 spec. and all three versions of the DITA 1.3 spec.

I've now created two new issues, to highlight the two individual problems: #2417 and #2418

@drmacro
Copy link
Member

drmacro commented Jun 15, 2016

I'm not sure how this is an issue. It's always been the case that you may have to set the memory higher to handle certain documents. The fact that the memory requirement may be higher in 2.x doesn't really change that.

@robander
Copy link
Member

As an anecdote that helps nothing: on the laptop I was using in 2010 (which was admittedly near end of life at the time), building the 1.2 spec in some situations actually caused my machine to shut down due to overheating. In other words - that document is not a "normal" DITA sample. The 1.3 spec is better designed, but the amount of linking is still extraordinary and should be expected to use a very large amount of memory.

@robander
Copy link
Member

robander commented Jun 15, 2016

The original issue here reported two issues.

The first is the memory issue, which I think has been addressed: the specification is a very large document, requiring an extraordinary amount of memory-intensive processing. As with any other similar document, additional memory must be allocated to allow the document to build. Worth noting: from the latest release going all the way back to 1.5.2 (the oldest version we still host), our documentation has for years described how to increase memory for such documents.

The second issue is about the undefined key message that appears in 1.8.5, but not in 2.x:

[DOTJ047I][INFO] Unable to find key definition for key reference "topic-contains" in root scope. The href attribute may be used as fallback if it exists

This key exists with many similar keys in the DITA 1.2 specification source (not in 1.3), and is in fact not defined. However, it is used in a peculiar construct:

<section
      conref="../common/commonNavLibraryTable.dita#contentmodel-topic/contains"
      id="contains" otherprops="contains">
  <title>Contains</title>
  <p>The content model of this element may differ based on where it is used.
  Content model information is located here: <xref keyref="topic-contains"></xref></p>
</section>

The key is only ever used inside of a section that specifies @conref -- meaning that the cross reference and the text around it will never be used. This was a hold over from early prototypes where we hoped to use a key for the content model, allowing it to be republished with alternate document types. That design fell through, but the text was left in, essentially as a code comment to remind people working with the source that the content model varies.

@DJBHollis
Copy link
Author

The second issue is about the undefined key message that appears in 1.8.5, but not in 2.x:

No! It appears in 2.2.3, not in 1.8.5.

@robander
Copy link
Member

Right - I was just editing my comment after I realized that I'd gotten that backwards, but will let the comment stand at this point.

I think the message here is useless - the key can never be evaluated, so we shouldn't care if it is defined.

That said - it's not a good design in the source. In fact, it's a really bizarre design in the source when looked at without knowing the original intent. I don't think there is any reason to try to make the code smart enough to detect this condition, and avoid warning about it. Basically - we have a key in the content, and that key is never defined. The fact that it's used in a location that won't result in anything useful is not really the point - the message is correct that we have a key in the source, and that key is never defined.

@drmacro
Copy link
Member

drmacro commented Jun 15, 2016

This has to be a change to the order of preprocessing, with key resolution now happening before content reference resolution. Ideally the preprocessing would resolve all direct URI conrefs, then do key space construction, then do the rest of the processing, including resolving key-based conrefs. Direct URI conrefs can't be affected by key resolution but definitely affect key space construction, as we see here.

@DJBHollis
Copy link
Author

The first is the memory issue, which I think has been addressed ...

I appreciate that large documents require more memory, and that a vast number of links exacerbate it. However, the 2.x architecture, I believe, is far more memory intensive. XEP with 1.8.5 is able to build the DITA 1.2 spec., but with 2.2.3 it cannot.

Also, the free Apache FOP builds documents that the paid for XEP cannot. The DOT controls the environment that XEP works in. Is this really an XEP issue? I cranked the memory right up to 12GB!

@DJBHollis
Copy link
Author

Right - I was just editing my comment after I realized that I'd gotten that backwards, but will let the comment stand at this point.

Apologies for jumping in too quickly.

@robander
Copy link
Member

@drmacro -

Direct URI conrefs can't be affected by key resolution but definitely affect key space construction, as we see here.

Just wanted to note that this case (resolving conref in the topic) doesn't actually affect the key space. The key space is the same before and after this conref is evaluated - in either case, the key topic-contains is undefined / not part of the key space. Resolution of conrefs in maps is a separate issue of course.

@drmacro
Copy link
Member

drmacro commented Jun 15, 2016

Yes, I guess I mean key space construction and key reference validation/resolution.

@robander
Copy link
Member

However, the 2.x architecture, I believe, is far more memory intensive.

In general, I would not be surprised at this. The 2.x releases add a lot of processing to support DITA 1.3; the fact that it is based on a new version of the standard, with significant new processing requirements (branch filtering, key spaces), is likely to result in more memory requirements.

XEP with 1.8.5 is able to build the DITA 1.2 spec., but with 2.2.3 it cannot.

I'm assuming you've got the same version of XEP running in each case - please correct me if I'm wrong. If that's the case, then I can only think something in the topic.fo file changed between versions. It's difficult to simply compare the files, because so many little things have changed as a result of fixes and new features, so I'm not sure what change would cause additional memory usage.

Also, the free Apache FOP builds documents that the paid for XEP cannot. The DOT controls the environment that XEP works in. Is this really an XEP issue? I cranked the memory right up to 12GB!

The FO file itself should be identical, possibly excepting the few features/extensions that are supported by XEP and fail in FOP. If the same FO markup works in FOP and runs out of memory in XEP, then the fact it's running within DITA-OT is not the issue.

@robander
Copy link
Member

robander commented Jun 15, 2016

For what it's worth - when I generate topic.fo using the DITA 1.2 spec:

  • For FOP, the file is 26,200,795 bytes
  • For XEP, the file is 26,230,983 bytes

This is with DITA-OT 2.3, with a few of my own plugins in place - so there could be a bit extra in the FO, but those extra bits would be identical in the FO and XEP versions. I'm not sure what is causing that 30K extra in XEP - it's not much relative to the overall size, but I also don't know if that extra piece could somehow cause memory issues. I do see just a few rx: link attributes and rx: table header attributes.

@robander
Copy link
Member

Even with this change, with both sets of core code at the same level of XSL, it's tough to do the diff - every generated ID differs by a digit or two. But it looks like most of the extra size probably comes from extra markup around index entries in the XEP version. I don't have a copy of XEP handy to actually build that version of the FO, but I wonder if the FO generated for FOP will run cleanly in XEP. If so, then it would seem that the index markup is the problem. If not -- that is, if the exact same FO file builds in FOP and fails in XEP -- then this would clearly be an issue with XEP's memory management for this document, keeping in mind that this is a very unusual document.

@robander
Copy link
Member

I took the FO generated for FOP, handed it to a co-worker to test with XEP, and he verified that building that FO file in XEP (outside of the DITA-OT process) also resulted in a Java Heap Failure after page 724. In the FOP output, which wouldn't match exactly but should be similar, page 724 (of 1256) is in the middle of the bookmap section of the language specification.

@kbrown01
Copy link

If someone could ZIP that FO file and post to me, we would be happy to examine what the issue is. you could zip and email to kevin at renderx.com

@DJBHollis
Copy link
Author

DJBHollis commented Jun 15, 2016

Here are the various files, attached.

DITA1.2-DOT2.3.0-XEP4.2.5-topic.fo.zip - fail
DITA1.2-DOT1.8.5-XEP4.2.5-topic.fo.zip - pass
DITA1.2-DOT2.3.0-FOP-topic.fo.zip - pass

@kbrown01
Copy link

kbrown01 commented Jun 15, 2016

I had no issues processing all three of those file with RenderX to PDF … except that none of the images were provided in the archive.

Can someone recreate those with the associated images so I can test (or if they share images, just zip up one set of images for me?).

Kevin Brown

RenderX

@robander
Copy link
Member

I'm attaching a zip of the resources/ directory for the specification, which should contain all images (along with a few other misc files related to the spec).
resources.zip

@kbrown01
Copy link

closer, a few more images needed:

langref/images/imagemapworld.jpg of type null
temp/pdf/Configuration/OpenTopic/cfg/common/artwork/warning.gif of type null

@robander
Copy link
Member

robander commented Jun 15, 2016

Attaching here:

imagemapworld
warning

@kbrown01
Copy link

Wed 06/15/2016_13:24:16.92
(document [system-id file:/C:/Users/kbrown01/Desktop/DITA-OT/DITA1.2-DOT2.3.0-XEP4.2.5-topic.fo/DITA1.2-DOT2.3.0-XEP4.2.5-topic.fo](validate [validation OK])
(compile
(meta-info )
(masters
...
[1171][1172][1173][1174][1175][1176][1177][1178][1179][1180][1181][1182][1183][1184][1185][1186][1187][1188][1189][1190][1191][1192][1193][1194][1195][1196][1197][1198][1199][1200][1201][1202][1203][1204][1205][1206][1207][1208][1209][1210]))
Wed 06/15/2016_13:24:54.59

1210 pages, about 36.5 secs, 1024MB memory using Java 1.8.
It is unclear to me what the issue is. Perhaps folks should contact me about setups on their machines. Are you using 64bit Java or 32bit Java?

PDF attached. result.zip contains the full log. Memory consumption could be reduced and speed increased if validation was turned off.

result.zip

DITA1.2-DOT2.3.0-XEP4.2.5-topic.pdf

@kbrown01
Copy link

I would note a few warnings about value of start-indent, I would need some time to study where these are coming from.

@robander
Copy link
Member

Thanks for the follow-up @kbrown01.

If it's a Java issue, there isn't much we can do in the toolkit.

Turning off validation could be done but I doubt we want that as the default - if there are problems in the FO, it's better to have XEP report them.

@kbrown01
Copy link

Its not a Java issue except statements like "I set Java -Xmx to {some huge value}" ...
Well, if you are using 32bit Java in your installation that is impossible. On a PC, you could never get beyond 1248MB memory and maybe not even 1024MB memory depending on your machine. On Linux/Unix that is about 2GB. I am asking if folks are using 32bit Java as I tested with 64bit Java but even set memory to 1024MB and it is fine.

@DJBHollis
Copy link
Author

I'm using a Mac. The JVM is built into Oxygen 18. From their webpage:

OS X 10.8 and later (Includes Java SE 8u72)

@kbrown01
Copy link

kbrown01 commented Jun 15, 2016

If you are formatting through oxygen (meaning you have the FO open in Oxygen and you execute the format through XEP), then what does you xep script have inside it? how is it referencing Java and memory?

@kbrown01
Copy link

I also formatted using Java 1.6 32bit with -Xmx1024 without issue. There is no issue to format this document in 1024MB memory or less on both 32bit and 64bit Java.

@kbrown01
Copy link

Not knowing what you all are writing about, it would be nice if someone reported that the PDF I posted is correct and no issues were found. It would also be nice to understand why (apparently) so many people are having issues. All I did was download that FO, change the links in it to my own local disk for the images (as they had paths from the creator), and format it without any issue whatsoever.

I would guess folks have issues in using RenderX XEP and not understanding the scripts to call it (or how to call Java or even what Java they are using or something?) and the fact is that there is no problem with RenderX XEP.

@DJBHollis
Copy link
Author

Apologies for the delay. I have looked through the PDF, and it looks like a bog standard DOT PDF. I can't see any particular issues.

I've attached the DOT log file:
DITA1.2-DOT2.3.0-XEP4.2.5-log.txt

@kbrown01
Copy link

This message error you are getting:

java.lang.OutOfMemoryError: GC overhead limit exceeded

Is not RenderX out of memory, it comes from the fact that you are specifying more memory than Java can allocate.

In no way can Java allocate -Xmx12288m in heap, that is 12GB

The next question is what version of Java is this?

/Applications/oxygen/.install4j/jre.bundle/Contents/Home/jre/bin/java

Is that 64bit Java or 32bit Java?

From: David Hollis [mailto:notifications@github.com]
Sent: Thursday, June 16, 2016 2:22 AM
To: dita-ot/dita-ot dita-ot@noreply.github.com
Cc: kbrown01 kevin.brown@xportability.com; Mention mention@noreply.github.com
Subject: Re: [dita-ot/dita-ot] PDF2: build failure (#2415)

Apologies for the delay. I have looked through the PDF, and it looks like a bog standard DOT PDF. I can't see any particular issues.

I've attached the DOT log file:
DITA1.2-DOT2.3.0-XEP4.2.5-log.txt https://github.com/dita-ot/dita-ot/files/318057/DITA1.2-DOT2.3.0-XEP4.2.5-log.txt


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #2415 (comment) , or mute the thread https://github.com/notifications/unsubscribe/AAsJo7LFHo6prRYCJQo75ELtuzTDI0qXks5qMRW0gaJpZM4I1otD . https://github.com/notifications/beacon/AAsJowoQGJkpQBRdDNPlv5dlsZ8_xZX-ks5qMRW0gaJpZM4I1otD.gif

@DJBHollis
Copy link
Author

I didn't go straight to 12GB, I built it up incrementally. I don't recall where I started at, nor the increments. I probably started around 1024m, then increased it.

As its a Mac, I'd assume its 64 bit. I'd be really surprised if its 32 bit. However, the JVM is built into Oxygen. Radu, @raducoravu , would need to confirm.

@kbrown01
Copy link

I think it is and it is 64bit on MacOS.

But that said, I can run that FO to PDF in 1024MB on my PC so again I do not understand the issue.

If you take the FO you sent me and just run separately RenderX without all the toolkit stuff, does it run?

Look in the XEP installation directory and for a MAC you should have xep (or xep.sh). You can edit that to add –Xmx1024MB on the start command and you should be able to run:

xep –fo /path/to/your/fo/file.fo –pdf

And see what happens?

From: David Hollis [mailto:notifications@github.com]
Sent: Thursday, June 16, 2016 1:23 PM
To: dita-ot/dita-ot dita-ot@noreply.github.com
Cc: kbrown01 kevin.brown@xportability.com; Mention mention@noreply.github.com
Subject: Re: [dita-ot/dita-ot] PDF2: build failure (#2415)

I didn't go straight to 12GB, I built it up incrementally. I don't recall where I started at, nor the increments. I probably started around 1024m, then increased it.

As its a Mac, I'd assume its 64 bit. I'd be really surprised if its 32 bit. However, the JVM is built into Oxygen. Radu, @raducoravu https://github.com/raducoravu , would need to confirm.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #2415 (comment) , or mute the thread https://github.com/notifications/unsubscribe/AAsJo1OJSofOXgCD7_PvSNza9SWgPuIAks5qMbClgaJpZM4I1otD . https://github.com/notifications/beacon/AAsJo5dIs34aw6xVyEL7s4k9wPsugoniks5qMbClgaJpZM4I1otD.gif

@raducoravu
Copy link
Member

raducoravu commented Jun 17, 2016

Just had some time to test this on my side, downloaded the DITA 1.3 specs ZIP on Mac OSX, opened main DITA Map in Oxygen 18.0 and published to PDF using DITA OT 2.3 and XEP. As I was telling @DJBHollis before, the DITA OT build file called:

DITA-OT2.x\plugins\org.dita.pdf2.xep\build_xep.xml

runs XEP like this:

 <java classname="com.idiominc.ws.opentopic.fo.xep.Runner" resultproperty="errCode"
      failonerror="${xep.failOnError}" fork="true" maxmemory="${maxJavaMemory}" taskname="xep">

and that ${maxJavaMemory} param value by default is 500M.
I increased it to "1100m" and had no problems generating the PDF.
@jelovirt should we maybe remove the "fork="true"" set on the XEP process so that it uses the same JVM as the DITA OT? From I remember the Apache FOP process runs in the same JVM process.
@infotexture also the "maxJavaMemory" param is not documented anywhere. But it's use is quite limited, probably right now only XEP is started as a separate process.

@DJBHollis
Copy link
Author

DJBHollis commented Jun 17, 2016

I am very, very, very pleased and relieved to confirm that I can now build both the DITA 1.2 spec. and the DITA 1.3 All-Inc. spec. with DOT and XEP. Hallelujah!

I had followed Radu's previous advice, but misunderstood. I set maxmemory, this time I set maxJavaMemory. It made all the difference!

The values I used were: maxJavaMemory=1024m and JVM=-Xmx1536m

@kbrown01

java.lang.OutOfMemoryError: GC overhead limit exceeded

This error is when there's not enough memory, not too much. The XEP memory was at the DOT default of 500m, and not enough.

My Mac has 16 GB memory, so I felt that I could take it to 12 GB.

@jelovirt should we maybe remove the "fork="true"" set on the XEP process so that it uses the same JVM as the DITA OT? From I remember the Apache FOP process runs in the same JVM process.

I'd be cautious about that. I watched the memory on the Mac OS Activity Monitor. With FOP, something called 'Launcher' appeared, and used 4GB, if I recall correctly.

Watching the forked XEP, I saw two Java instances. The DOT Java maxed out at about 1.40 GB, and the XEP maxed out at about 1019 MB. They were both running at the same time, so significant overlap.

If you switch to the one JVM for both, the setting would need to be 2560MB.

I think I'd leave it with the two, but agree it needs to be documented.

I don't know whether maxJavaMemory is used for anything other than XEP? If not, it might make more sense if it were XEPJavaMemory, or something like that.

Given that I'm a lazy so-and-so, and use Oxygen to run DOT builds, I'd also suggest that Oxygen highlight the XEP memory requirement on the FO processor tab, or the Advanced tab.

Eliot, @drmacro

I'm not sure how this is an issue. It's always been the case that you may have to set the memory higher to handle certain documents. The fact that the memory requirement may be higher in 2.x doesn't really change that.

But things move on. Microsoft has stopped support for XP, and seem to have moved on from the 32 bit vs. 64 bit memory issues. The Java 8 download page talks in terms of 32 bit and 64 bit browsers, and this seems to be the main justification for maintaining the 32 bit Java version. OK, there are bound to be old PCs out there. I don't know what base PC specs are like, or what typical users have.

The point is, I don't think to myself, "I'm opening up Oxygen, it's a Java application, how much memory does it need?" It just happens. I think it should be more like that for the DOT.

Watching the Activity Monitor, Java doesn't open up the allotted memory all at once, it increases during the process. So, if the settings I used were the default, say, it wouldn't mean that folks had to have PCs with at least 2.5 GB of free memory for DOT with XEP. Smaller docs would only use smaller amounts of memory. BTW, this is still less than the 1.5 GB JVM and 4 GB Launcher that FOP seems to use.

Many thanks to everyone for support, advice and interest. I appreciate it!

@DJBHollis
Copy link
Author

Could you please add maxJavaMemory to the log.

@DJBHollis
Copy link
Author

Just for the record, every so often someone asks about this on the XEP email list.

Could folks please consider increasing the value of maxJavaMemory? With modern 64G systems, it's pretty pointless having a low value.

@raducoravu
Copy link
Member

By default with Java 8 and newer the Xmx size is 1/4 of the internal memory so maybe we could remove the "Xmx" parameter completely:
https://stackoverflow.com/questions/4667483/how-is-the-default-java-heap-size-determined

@jelovirt
Copy link
Member

My preference would be to run XEP in the same JVM as DITA-OT itself, like we do with FOP. I tried to change this, but I was not able to find XEP API docs that would have allowed me to do this. IIRC it was about static/global configuration that we didn't want to use.

@raducoravu
Copy link
Member

Because we do not distribute a particular XEP version with Oxygen, the publishing needs to work with almost any XEP version the end user is using. Which would have meant that if you wanted to use XEP in the same JVM process you would have needed to isolate the classloaders, force XEP to use only the libraries (maybe older Xerces libraries for example) which are shipped with it in order for XEP not to use newer libraries shipped with DITA OT and break. So in such cases when trying to use a tool for which you do not control the used version I'm usually a fan of using a separate process for it.

@jelovirt
Copy link
Member

@raducoravu I agree. Both ways to run XEP should be supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/low Low priority issue support Inquiries on how DITA/OT is intended to work
Projects
None yet
Development

No branches or pull requests

6 participants