20101113

Google ready to kill software patents?

Google on Oracle vs Google: "Each of the Patents-in-Suit is invalid under 35 U.S.C. § 101 because one or more claims are directed to abstract ideas or other non-statutory subject matter."
CUDOS Google! Refusing software patents like this the right thing to do for innovation!
More at groklaw: http://www.groklaw.net/article.php?story=20101111114933605

20101101

Java finally adds NIO2.

Java 7 comes with NIO2, "New I/O version 2",  stupid name I know, but it's packing some extremly important functions.
New functions that will enable us to do faster indexing, trace changes in filesystems and read more file attributes such as users and groups.
I have been waiting for this since early 2002 when the poposal for NIO2 came. I almost gave up hope on Java since then.
This makes it possible to update some core I/O functions in corpus and in our public java libraries for indexing. 

20100107

Gentle local file indexing, please

Unlike websites, local file systems tend to give much better feedback on file changes. Still, most search-solutions use considerable I/O, something that is very annoying. Users are annoyed to the extent that they completely uninstall the search and indexing altogether. - I've done that with google, microsoft and other desktop search tools too.

Still I know that there is a difficult balance to all this. Today there is good OS support for file events. I recently read this post about using .NET API:s to monitor changes in file systems. There are also Linux versions as INotify or the kernel deamon auditd to do the same by listening to kernel events. The manual OS-independent method is to watch the modified time stamp for changes on all folders. Worst case is to have to scan the entire folder tree for changes, as if virus-scans where not annoying alone.

The event monitoring solutions work as long as they are on, but changes go unnoticed while the listening agent are off, and they need to fall back to scan for changes mode if they are switched on again - costing considerable annoying IO activity. Then comes the indexing that really pick up the IO...

Solutions:
* Late indexing and grouped indexing before searching, just log changes until then.
* While idle, create a low priority process for indexing groups of changed/new documents in one phase
* Push and convert documents by type to be nice to I/O by i.e. converting all doc files to XML at once.
* Push indexing to a remote server to reduce the load.

Anything else to consider for desktop/enterprise search?
Jonas

20091207

Experimenting with Lucene 3 and parametric search

I have experimented with apache lucene 3 and parametric search:

Its just a test shot, but it:
* Gives you all search results on a single page - as a speed test
* Summarizes inventors, applicants, patent-classes etc from the result as an overview.
* Is some playground for further java-script interaction with the results.

Indexing was kind of slow on my quad core, 120k patents took about 40 minutes, but searching is fast. It made me think about setting up some test suits for indexing. Like:

* SOLR / Lucene for all Swedish municipal-sites (250+)
* SOLR/ Lucene for a typical windows server environment.

Anyone done this already perhaps?

/jonas



20090102

New hope for system wide secure memory with G1-GC

I had this idea years ago about segmenting GC memory on small process islands and using proxies between them in order to speed up and use global - yet - secure memory addressing system wide. This idea is not brand new, but it combines the best from distributed memory models with tight integrated memory and security, or so was my idea back in 2003 at least. It didn't sell well though. Yep, I tried!

But here comes new hope, The new GC "G1" for java 1.7:
http://tech.puredanger.com/2008/05/09/javaone-g1-garbage-collector

With better languages on top of the JVM, like scala and groovy, it seems java might have a chance to bring us this cool platform, even though it has considerable legacy problems.

/jonas

20080924

BBC reports in study that patens at universities are blocking innovation

BBC reports that Canada-based Innovation Partnership, a non-profit consultancy, states in a newly released report that "`Blocking patents' are delaying advances in cancer medicine and food crops" and that "the full benefits of synthetic biology and nanotechnology will not be realised without urgent reforms to encourage sharing of information". http://news.bbc.co.uk/2/hi/science/nature/7632318.stm the full report is available (pdf, CC-license) at http://www.theinnovationpartnership.org/en/ieg/report/