Lingan (3) [Avatar] Offline
#1
Hi
I have a need to parse X12 files (text files) into chucks of XMLs.
X12 files are of a specific format.
Please let me know if Tika is a good option to do this .
Thanks
Lingan.
jukka.zitting (6) [Avatar] Offline
#2
Re: Parse huge text files (x12 files) to XML
Tika does not currently support X12 files, but doing so would be a nice new feature. I filed a feature request for that at https://issues.apache.org/jira/browse/TIKA-627.

Note that Tika uses simple XHTML as the output format, so it might not be immediately applicable to your use case. But it might be possible to achieve the kind of output you are after by for example using some XSL transformations on Tika's output. This depends of course on how well organized the output from the requested X12 parser would be.

On a design level Tika's streamed parsing model should have no trouble processing even huge files.

Message was edited by:
jukka.zitting
Lingan (3) [Avatar] Offline
#3
Re: Parse huge text files (x12 files) to XML
Thanks for the reply .
I understand that Tika is a transformation and extraction tool that uses other existing API.
As you know X12 is in a plain text format.
What I will need to do is to parse each segments in x12 which is in a plain text format and generate XML .

1. So I wanted to know if Tika is a good option to do the parsing based on some custom rules.
2.Then as you mentioned I could apply transformation to convert the XHTML into XML or
is it possible to configure Tika to output XML directly ?

Thanks.
chris.mattmann (14) [Avatar] Offline
#4
Re: Parse huge text files (x12 files) to XML
Hi Lingan,

I think that the best approach in your case would be to wire up a segmented X12 Parser in Tika. Your Parser will be handed a java.io.InputStream and you can create e.g., a Reader for that stream and parse out each segment of the x12 file. Then, you just need to use one of Tika's existing ContentHandlers (or a plain ol' Java SAX ContentHandler, or write your own) and then you can start emitting the XHTML that you desire.

You can learn more about how to do this in Chapters 8 and Chapter 11.

HTH,
Chris
Lingan (3) [Avatar] Offline
#5
Re: Parse huge text files (x12 files) to XML
Hi
This is in regards to the Jira request that has been created for handling X12 formats.
I was able to see the last thread about this request.
Here is a sample for x12 format , but I think the spec has to be actually bought.

http://www.xtranslator.com/prod/beginguidex12.pdf
chris.mattmann (14) [Avatar] Offline
#6
Re: Parse huge text files (x12 files) to XML
Thanks for the pointer!