What is the best way to index thousands of documents for use in a ASP NET 2.0...

Discussion in 'Digital Media' started by Meridian Q, May 9, 2008.

  1. Meridian Q Guest

    Meridian Q
    ...web site? I want to be able to add/delete documents, (mainly PDF's) allow my users to add documents in a members based service. What is the best way to index these documents as they are uploaded/donated/or already exist in certain folders? Basically I am looking for a program of some sort that allows me to index them into a SQL DB or XML file where I can then write code to allow searching of the indexed information and pull up the correct documents. The basic need is what is the best tool to use to index the content so I have a db to pull from in search queries based on keywords so I can show the most relevant documents to their search text. Is there a program or plug in of somekind that already indexes PDF's etc and places them into a db or xml format so I can use this info to create my own custom search script/results layout so they can open these docs if they find what they need?
    Indexing = file name, file extension, and keywords within the document. I know MS can index the document names which is fine, but if I have a PDF called meridian.pdf, but it is actually for particular part numbers or equipment names or something, I need to know that as well. So I was looking for something that also indexed content within the documents not just the file names.
  2. Microsoft Indexing service. a default service found on all windows 2003 server OS packages would be the best way to go.
  3. google and others do it! use google on your own server!
  4. Smutty Guest

    This is a wide topic.

    The documents can be stored in a variety of ways. One of them is storing them as binary data in MS SQL. This offers advantages and disadvantages. The advantages are offered in terms of security and the fact that SQL Server offers transaction support. The disadvantage is that retrieving the data will be slow since SQL Server pages data in 8 KBs, hence retrieving a document from SQL Server will result in lots of input / output being generated.

    Another approaches is store the documents on an FTP and simply store references of your documents in the DB. FTP's are much faster & they are built exactly to support file transport.

    There are lots of products that offer document management & workflow management. You can also try to use one instead of trying to reinvent the wheel.

    Hope this helps.

Share This Page