The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

Hello, new here, had a quick question.
i have pdf files, and i have ocr files, as well as a file that has the coordinates for each word in the pdf. My goal is to combine the text to the PDF so i can have search able PDF's. is there a part of the API or chapter of the book i should look more closely at?

Message was edited by:
it is an interesting little deal, the coordinate files were created by our OCR engine which is regarded as one of the best available, here is a little snippet from one of those:

11 means 3500 928 3936 1020
12 and 4016 908 4276 1016
13 from 4352 904 4688 1012
14 which 536 1064 944 1172
15 the 1004 1064 1232 1172
16 principal 1284 1060 1892 1184
17 part 1948 1060 2244 1184
18 of 2308 1060 2440 1164
19 the 2504 1056 2720 1168
20 production 2780 1060 3524 1184

ill give the mailing list a shot as well. the other, not fun solution is to re-run everything to output searchable PDF's from the OCR engine, which with 3 machines and our data set (33 million pages) will take about 4-5 months.