How to index PDF

Hello,

I’m actually trying to index PDF without success.
Anyone could explain how does it works ?

Thank you.

Sébastien Mizrahi wrote:

Hello,

I’m actually trying to index PDF without success.
Anyone could explain how does it works ?

Thank you.

You must parse the PDF into pure text using some libs

Nathan Li wrote:

Sébastien Mizrahi wrote:

Hello,

I’m actually trying to index PDF without success.
Anyone could explain how does it works ?

Thank you.

You must parse the PDF into pure text using some libs

Thank you for your quick answer :slight_smile:
Do you have the name of the lib I should use, and an small tutorial ?

Sébastien Mizrahi wrote:

Nathan Li wrote:

Sébastien Mizrahi wrote:

Hello,

I’m actually trying to index PDF without success.
Anyone could explain how does it works ?

Thank you.

You must parse the PDF into pure text using some libs

Thank you for your quick answer :slight_smile:
Do you have the name of the lib I should use, and an small tutorial ?

i use the command line tool “pdftotext” for this which i put into
lib/bin inside my app.

add a method to your model and add id to your indexed fields

e.g.

def text
path = ‘path/to/your/file.pdf’
text = #{RAILS_ROOT}/lib/bin/pdftotext -q \"#{path}\" -
end

ralf