Where cows roam free...

Wednesday, October 26, 2011

Lessons learned from Mona Lisa (part 2)

    I learned yet another important lesson about why it's extremely important to provide the fitness evaluator the exact same image as shown to the user.

    If you recall the original problem I reported was that the candidate image was being scaled down before evaluation, thereby allowing many pixels to avoid detection/correction. Yesterday I enabled anti-aliasing for the image shown to the user. I figured so long as the evaluator was seeing 100% of the pixels (no scaling going on) I should be safe, but it turns out this isn't enough.

    Take a look at the following images:

Anti-aliasing Enabled


Anti-Aliasing Disabled

They show the exact same candidates with anti-aliasing enabled and disabled. Notice how the anti-aliased version has "streaks" of errors across the face, similar to the problem I was seeing when the candidate was being scaled. It turns out that sometimes the candidates contains polygons that introduce errors into the image in the form of "streaks" (polygons rendered with sub-pixel precision). The interesting thing is that aliasing suppresses these errors so the evaluator function does not see it. Consequently, the users sees a whole bunch of errors which the fitness function will never fix. Sounds familiar?

In conclusion: you should always (always!) pass the fitness function the exact same image you display to the user. Better safe than sorry :)

Friday, October 21, 2011

Lessons learned from Mona Lisa

I've been studying Genetic Algorithms using Roger Alsing's Evolution of Mona Lisa algorithm over the past month and made some interesting discoveries:

Using opaque polygons improves rendering performance (and by extension the performance of the fitness function) by an order of magnitude.
In all things, favor many small changes over drastic large changes...
When adding a new polygon, give it a size of 1 pixel instead of assigning it vertexes with random coordinates. This improves its chances of survival.
When adding a new vertex, instead of dropping it into a random position give it the same position as an existing vertex in the polygon. It won't modify the polygon in any noticeable way, but it will open the door for the "move vertex" mutation to move more vertexes than it could before.
When moving vertexes, favor many small moves (3-10 pixels at a time) instead of trying to span the entire canvas.
If you're going to damp mutations over time, damp the amount of change as opposed to the probability of change.
The effects of damping are minimal. It turns out that if you've followed the above steps (favor small changes) there should be no real need to damp.
Don't use Crossover mutation. It introduces a high mutation rate which is great early on but very quickly high mutation becomes a liability because an image that is mostly converged will reject all but small mutations.
Don't scale the image in the fitness evaluator function. This one took me a while to figure out. If your input image is 200x200 but the fitness evaluator scales the image down to 100x100 before generating a fitness score it will result in candidate solutions containing steaks/lines of errors that are invisible to the fitness function but are clearly wrong to the end-user. The fitness function should process the entire image or not at all. A better solution is to scale the target image across-the-board so your fitness function is processing 100% of the pixels. If 100x100 is too small to display on the screen you simply up-scale the image. Now the user can see the image clearly and the fitness function isn't missing a thing.
Prevent the creation of self-intersecting polygons. They never yield good results so we can substantially speed up the algorithm by preventing mutations from creating them. Implementing the check for self-intersecting polygons was a pain in the ass but it was worth the trouble in the end.
I've modified the fitness score to remove hidden polygons (purely for performance reasons):

fitness += candidate.size();

I've increased the maximum number of polygons from 50 to 65535.

    When I first tried running Watchmaker's Mona Lisa example it would run for days and not look anything close to the target image. Roger's algorithm was better but still stagnated after an hour. Using the new algorithm I managed to recreate the target image in less than 15 minutes. The fitness score reads 150,000 but to the naked eye the candidate looks almost identical to the original.

    I put together a diagnostics display that shows me the entire population evolving over time. It also tells me how many unique candidates are active in the population at any given time. A low number indicates a lack of variance. Either the population pressure is too high or the mutation rates are too low. In my experience, a decent population contains at least 50% unique candidates.

I used this diagnostic display to tune the algorithm. Whenever the number of unique candidates was too low, I'd increase the mutation rate. Whenever the algorithm was stagnating too quickly I'd examine what was going on in the population. Very frequently I'd notice that the mutation amount was too high (colors or vertices moving too quickly).

    I'm glad I spent the past month studying this problem. It's taught me a lot about the nature of GAs. It has a lot more to do with design than code optimization. I've also discovered that it's extremely important to watch the entire population evolve in real time as opposed to only studying the fittest candidate. This allows you to discover fairly quickly which mutations are effective and whether your mutation rate is too low or high.

Genetic Algorithms are a lot of fun. I encourage you to play with them yourself.

Tuesday, December 16, 2008

MySQL dates that work across multiple time-zones

Wouldn't it be nice if you could store date information in your database that would be correct regardless of your server and client time-zones (that may themselves change over time)?

One way to do this is to store all date information relative to the GMT time-zone regardless of the server and client time-zones. Whenever a client interacts with a date you simply convert it from GMT to the client's specific time-zone and back as needed.

Well, after 12 hours of investigation, I finally got this to work ;) Here's how:

Configure the server to use the GMT time-zone by inserting default-time-zone=UTC (case sensitive) in my.cnf
Set the useLegacyDatetimeCode MySQL driver parameter to false. This fixes this bug: http://bugs.mysql.com/bug.php?id=15604
When reading/writing dates from/to the database always use UTC dates

That's it.

Wednesday, December 3, 2008

Tracking Java Versions using Google Analytics (part 2)

Update: I just posted (December 4th, 2008) a version that will allow you to differentiate between 1.6.0* the family versus 1.6.0 the version.

I'm experimenting with a new script for tracking Java versions. To use the new code simply add the following two lines after your Google Analytics script:


<script src="http://java.com/js/deployJava.js" type="text/javascript"></script>
<script src="http://jre-analytics.googlecode.com/svn/trunk/jre-analytics.js" type="text/javascript"></script>

I encourage you to sign up to the mailing list to follow future discussions on this topic: http://code.google.com/p/jre-analytics/

Benefits of new script

Easier, auto-updating installation. Linking to my external script means that you get the benefit of future updates without any effort on your part.
Does not conflict with other scripts. The old code uses _setVar() which records at most one variable across an entire user session. Different scripts compete for exclusive access to this variable. The new code uses _trackEvent() which allows multiple values to be recorded in separate namespaces, so different scripts do not interfere with each other's data collection.
Records multiple JRE versions per user. The old script was supposed to do this but the limitations of _setVar() meant it only recorded the latest JRE version.

Details

About 48 hours after you install the script you should see new data under Content -> Event Tracking. Please note that the Event Tracking menu will not show up until you collect enough data.

Please let me know the statistics reported for your site. Post this at http://forums.java.net/jive/thread.jspa?messageID=317425

The new script will record two kinds of events:

detected: indicates whether Java was detected on the user's machine.
version: indicates the specific Java version detected on the user's machine.

This should give you a much better overview of what's really going on. I look forward to your feedback!

Warning

According to http://code.google.com/apis/analytics/docs/eventTrackerGuide.html both the old and new script will cause your page's "bounce rate" to incorrectly get reported as zero. This is a limitation of how _setVar() and _trackEvent() are implemented and there doesn't seem to be anything I can do about it. Sorry :(

Saturday, November 8, 2008

Tracking Java Versions using Google Analytics

Update: This code does not work as expected (it will only report the newest JRE version). Please see the newer version here.

Ever wonder what Java version your website visitors have installed? Here's how you can leverage Google Analytics to find out:

Locate the Google Analytics script in your HTML page (you must be using the new "ga.js" version). Here is what mine looks like:

<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
var pageTracker = _gat._getTracker("UA-XXXXX-X");
pageTracker._initData();
pageTracker._trackPageview();
</script>

Insert the following code after that script:

<script src='http://java.com/js/deployJava.js' type='text/javascript'></script>
<script type='text/javascript'>
var jreVersions = deployJava.getJREs();
if (jreVersions.length==0)
pageTracker._setVar("Java: none");
for (var i=0; i<jreVersions.length; ++i)
pageTracker._setVar("Java: " + jreVersions[i]);
</script>

Please note, if you are using Blogger you will need to encode the '<' character as "<". Open up Error Console in your browser to double-check that the script is not generating any errors.
Wait 24 hours for Google to update the report.
View the updated data at Google Analytics -> Visitors -> User Defined.
There are two ways to drill-down based on user-defined values:

Advanced Segments -> New -> Dimensions -> Visitors -> User Defined. For example:

Track users without Java: Matches Exactly: "Java: none"
Track users with Java: Starts With: "Java:" and Does Not Match Exactly: "Java: none"
Track users with Java 1.6: Starts With: "Java: 1.6"
You can then check how many Flash users also had Java installed. Or how many Windows users had Java 1.6 installed. Or the connection speed of Java users.

Visitors -> User Defined -> [Pick One] -> Dimension. For example, you can now click on Java: 1.6.0 -> Dimension -> Browser to find out what browsers visitors with Java 1.6.0 were using.

Publish your results here: http://forums.java.net/jive/thread.jspa?messageID=317425

Sunday, October 26, 2008

Integrating Google Guice into JUnit4 tests

You can integrate Guice into your JUnit 4 tests with three simple steps:

Add the following class to your classpath:

package com.google.inject.junit;

import com.google.inject.Guice;
import com.google.inject.Injector;
import com.google.inject.Module;
import java.util.List;
import org.junit.runners.BlockJUnit4ClassRunner;
import org.junit.runners.model.InitializationError;

/**
 * Uses Guice to inject JUnit4 tests.
 *
 * @author Gili Tzabari
 */
public class GuiceTestRunner extends BlockJUnit4ClassRunner
{
  private final Injector injector;

  /**
   * Creates a new GuiceTestRunner.
   *
   * @param classToRun the test class to run
   * @param modules the Guice modules
   * @throws InitializationError if the test class is malformed
   */
  public GuiceTestRunner(final Class<?> classToRun, Module... modules) throws InitializationError
  {
    super(classToRun);
    this.injector = Guice.createInjector(modules);
  }

  @Override
  public Object createTest()
  {
    return injector.getInstance(getTestClass().getJavaClass());
  }

  @Override
  protected void validateZeroArgConstructor(List<Throwable> errors)
  {
    // Guice can inject constructors with parameters so we don't want this method to trigger an error
  }

  /**
   * Returns the Guice injector.
   *
   * @return the Guice injector
   */
  protected Injector getInjector()
  {
    return injector;
  }
}

Customize GuiceTestRunner for your specific project by subclassing it. For example:

package myproject;

import com.google.inject.junit.GuiceTestRunner;
import com.wideplay.warp.persist.PersistenceService;
import myproject.GuiceModule;
import org.junit.runners.model.InitializationError;

/**
 * JUnit4 runner customized for our Guice module.
 *
 * @author Gili Tzabari
 */
public class GuiceIntegration extends GuiceTestRunner
{
  /**
   * Creates a new GuiceIntegration.
   *
   * @param classToRun the test class to run
   * @throws InitializationError if the test class is malformed
   */
  public GuiceIntegration(Class classToRun) throws InitializationError
  {
    super(classToRun, new GuiceModule());
  }
}

Add @RunWith to all your JUnit test classes. For example:

@RunWith(GuiceIntegration.class)
public class MyTest
{
  @Inject
  public MyTest(SomeDependency foo)
  {
    ...
  }
}

That's it! Guice will now inject your test classes.

EDIT: I'm a convert :) I now use AtUnit to integrate Guice into JUnit.

Saturday, October 18, 2008

RESTful Dynamic Forms

One of the key characteristics of RESTful web services is (application) "statelessness". That is, all application state should be stored on the client-end and send over with every request.

Sometimes clients are asked to manipulate forms dynamically, adding or removing elements, before submitting it to the server. Typically the client uses "sessionId" to manipulate the page across multiple requests.

If you take a step back, however, you will realize that this application state doesn't belong on the server at all. The server doesn't really care about the intermediate page states, which is why it doesn't persist them to the database in the first place. It only cares about the initial page state (which it sends to the client) and the final page state (which it saves to the database). The server manipulates the page state on behalf of the client because it's easier to implement business logic in modern programming languages than it is to implement them in Javascript on the client. Ideally the client should download the page, manipulate it on its end, and submit the result to the server.