Author Topic: Strange error from Koders search engine (Read 2429 times)

polonus · « **on:** January 27, 2007, 09:37:15 PM »

Hi malware fighters,

Browser code security is Polonus's fancy, so he occasionaly visits the search engine of Koders. But now while clicking on one of the search results this error is returned:
===============
An exception has occurred in this application.
read past EOF

Report this Error

http://www.koders.com/kv.aspx?fid=1222FAC6C7FCB873B5321E9D8469F2B2169956F4&s=IndexWriter

System.IO.IOException: read past EOF
at Lucene.Net.Store.FSInputStream.ReadInternal(Byte[] b, Int32 offset, Int32 len)
at Lucene.Net.Store.InputStream.ReadBytes(Byte[] b, Int32 offset, Int32 len)
at Lucene.Net.Index.SegmentReader.Norms(String field, Byte[] bytes, Int32 offset)
at Lucene.Net.Index.SegmentReader.Norms(String field)
at Lucene.Net.Search.TermQuery.TermWeight.Scorer(IndexReader reader)
at Lucene.Net.Search.IndexSearcher.Search(Query query, Filter filter, Int32 nDocs)
at Lucene.Net.Search.MultiSearcher.Search(Query query, Filter filter, Int32 nDocs)
at Lucene.Net.Search.Hits.GetMoreDocs(Int32 min)
at Lucene.Net.Search.Hits..ctor(Searcher s, Query q, Filter f)
at Lucene.Net.Search.Searcher.Search(Query query)
at Koders.KodeShare.WebServer.Core.Search.SearchUtil.Search(Query q)
at Koders.KodeShare.WebServer.Controls.KodeViewer.Page_Load(Object sender, EventArgs e)
at System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr fp, Object o, Object t, EventArgs e)
at System.Web.Util.CalliEventHandlerDelegateProxy.Callback(Object sender, EventArgs e)
at System.Web.UI.Control.OnLoad(EventArgs e)
at System.Web.UI.Control.LoadRecursive()
at System.Web.UI.Control.LoadRecursive()
at System.Web.UI.Control.LoadRecursive()
at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)
==============
Where is this 500 errorcode coming from, are they experiencing problems with their LuceneIndexWriter or is my browser too smart reading too many lines. Who can come up with an explanation. I think it is something on the server side of the Koders, but I like to be sure, 'cause with Google CodeSearch I have no problems.

polonus

polonus · « **Reply #1 on:** January 28, 2007, 06:19:59 PM »

Hi ye all,

It was something on their (Koder's webserver's) part. Now it is working fine. So an not too incriminate Lucene IndexWriter's bug. Installing this could help: Index: Mono.Security.Protocol.Tls/TlsClientSettings.cs
Lucene is also an application of the Flock Browser: IndexWriter:

Code: [Select]

import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;

import java.io.File;
import java.io.BufferedReader;
import java.io.FileReader;
import java.text.DecimalFormat;
import java.util.Date;
import java.util.Vector;
import java.util.Collections;

public class LuceneIndexer {
private File corpusDir = new File("extracted_corpus");
private File indexDir = new File("lucene_index");

public LuceneIndexer() { }

public static void main (String[] args) throws Exception {
LuceneIndexer indexer = new LuceneIndexer();

// index all docs unless otherwise spec'd
int maxToIndex = args.length > 0 ?
Integer.parseInt(args[0]) : 0;

// verify that we're running from the right directory
String curDir = new File(".").getCanonicalPath();
if (!curDir.endsWith("benchmarks"))
throw new Exception("Must be run from benchmarks/ ");

// assemble the sorted list of article files
String[] fileList = indexer.buildFileList();

// start the clock and build the index
long start = new Date().getTime();
int numIndexed = indexer.buildIndex(fileList, maxToIndex);

// stop the clock and print a report
long end = new Date().getTime();
indexer.printReport(start, end, numIndexed);
}

// Return a lexically sorted list of all article files from all
subdirs.
private String[] buildFileList () throws Exception {
File[] articleDirs = corpusDir.listFiles();
Vector filePaths = new Vector();
for (int i = 0; i < articleDirs.length; i++) {
File[] articles = articleDirs[i].listFiles();
for (int j = 0; j < articles.length; j++) {
String path = articles[i].getCanonicalPath();
if (path.indexOf("article") == -1)
continue;
filePaths.add(path);
}
}
Collections.sort(filePaths);
return (String[])filePaths.toArray(new String[filePaths.size()]);
}

// Build an index, stopping at maxToIndex docs if maxToIndex > 0.
private int buildIndex (String[] fileList, int maxToIndex)
throws Exception {
IndexWriter writer = new IndexWriter(indexDir,
new WhitespaceAnalyzer(), true);

int docsSoFar = 0;
for (int i = 0; i < fileList.length; i++) {
// add content to index
File f = new File(fileList[i]);
Document doc = this.nextDoc(f);
writer.addDocument(doc);

// bail if we've reached spec'd number of docs
if (maxToIndex > 0 && ++docsSoFar == maxToIndex)
break;
}

// finish index
int numIndexed = writer.docCount();
writer.optimize();
writer.close();

return numIndexed;
}

// Retrieve an article, parse it, and return a Lucene Document.
private Document nextDoc(File f) throws Exception {
// the title is the first line, the body is the rest
BufferedReader br = new BufferedReader(new FileReader(f));
String title;
if ( (title = br.readLine()) == null)
throw new Exception("Failed to read title");
StringBuffer buf = new StringBuffer();
String str;
while ( (str = br.readLine()) != null )
buf.append( str );
br.close();
String body = buf.toString();

// add title and body to doc
Document doc = new Document();
Field titleField = new Field("title", title,
Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.NO);
Field bodyField = new Field("body", body,
Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.NO);
doc.add(titleField);
doc.add(bodyField);

return doc;
}

// Print out stats for this run.
private void printReport(long start, long end, int numIndexed) {
float secs = (float)(end - start) / 1000;
DecimalFormat format = new DecimalFormat("#,##0.00");
String secString = format.format(secs);
Package lucenePackage = org.apache.lucene.LucenePackage.get();
String version = lucenePackage.getSpecificationVersion();
System.out.println("Java Lucene " + version
+ " DOCS: " + numIndexed + " SECS: " + secString);
}
}

polonus