Wednesday, October 6, 2010

Converting VLFeat SIFT output to Lowe-compatible SIFT output for Bundler

I've recently been trying to get Bundler working on my Mac. Bundler is a part of a set of software (also including PMVS, CMVS, and others) that provides an open-source alternative to Microsoft PhotoSynth. Unfortunately, it has a dependency on a closed-source implementation of the SIFT keypoint-detection algorithm written by David Lowe.

There are a few open-source implementations of the SIFT algorithm, and one is part of the VLFeat toolkit. Unfortunately, VLFeat's SIFT tool generates differently formatted output from the Lowe version, even though it's the same sort of information. I spent far too long hunting for a conversion script, and found no standalone (or even Bundler-ready) one, so I did it with awk.

If you are using Bundler, you can just update the ToSift.sh script, replacing the line:

echo "mogrify -format pgm $IMAGE_DIR/$d; $SIFT < $pgm_file > $key_file; rm $pgm_file; gzip -f $key_file"

with


echo "wc -l $key_file.vlfeat | awk '{ print \$1 \" 128\" }' > $key_file"
echo "awk 'BEGIN { split(\"4 24 44 64 84 104 124 132\", offsets); } { i1 = 0; tmp = \$1; \$1 = \$2; \$2 = tmp; for (i=1; i<9; i++) { i2 = offsets[i]; out = \"\"; for (j=i1+1; j<=i2; j++) { if (j != i1+1) { out = out \" \" echo "rm $key_file.vlfeat"


Once that's done, the RunBundler.sh script can be run as usual.

If you'd rather just convert existing VLFeat-generated key files, you can use:

zcat vlfeat.key.gz | wc -l | awk '{ print $1 " 128" }' > lowe.key

zcat vlfeat.key.gz | awk 'BEGIN { split("4 24 44 64 84 104 124 132", offsets); } { i1 = 0; tmp = $1; $1 = $2; $2 = tmp; for (i=1; i<9; i++) { i2 = offsets[i]; out = ""; for (j=i1+1; j<=i2; j++) { if (j != i1+1) { out = out " " }; out = out $j }; i1 = i2; print out } }' >> lowe.key
gzip lowe.key


Hope this helps someone avoid the effort!

Tuesday, February 2, 2010

Stop Muni service cuts

This is a repost of an email I sent to Mayor Newsom to encourage him to avoid Muni service cuts. For more information on the issue, see the SF Bike Coalition's take, or the official Muni page.

Dear Mayor Newsom:

I usually agree with your policies, which is why your stance on Muni is so jarring to me. When I moved to San Francisco, I gave up my car, assuming correctly that I'd be able to get everywhere by bike and transit. Cutbacks in Muni services feel like a betrayal of trust -- will I have to start waking up earlier to get to work on time? Am I going to be effectively trapped on home on weekends when the weather is bad?

Especially in bad weather, I ride Muni almost every day. I take the 22 to and from BART so I can get to work, and on weekends, I often take the 19 to shopping downtown. (The bus is such a better option than driving that I'll often push my roommates with cars to ride the bus instead of trying to park when we go to the Metreon for movies.) I visit friends in Cole Valley by taking the N.

You've expressed concern that increasing parking hours and prices will drive down visits to businesses, but from what I've read, the MTA's plan is not likely to result in the same disaster that occurred in Oakland. My roommates, who drive, never express concern about the prices of parking being too high. Rather, they complain about the lack of parking spaces, an issue which higher rates might actually improve!

In addition, although the following is not written by me, I thoroughly agree with all it has to say:

I am writing to urge you to find a way to make Muni work-- the proposed service cuts are unacceptable for a Transit First city. The proposal to reduce frequency will throw Muni into a downward spiral-- with riders unable to depend on regular service, people will stop riding Muni and driving more, creating more traffic, pollution, and reducing service even more.

Any service cuts must be off the table-- and I urge you to find alternative ways to fund Muni. We are a Transit First city and funding Muni must be your top priority. With the potential for millions of dollars by strategically extending parking meter hours, you can save Muni.

Private auto traffic congestion is the top reason for Muni delays-- and free, unabated parking encourages driving while starving the Muni budget. It is essential that you, as a 'green' mayor and leader of this Transit First city, support your appointee's approval of the recommendations of MTA Staff to extend meter hours into the evening and Sunday in specific commercial corridors.

Friday, September 11, 2009

Ruby's Marshal and ActiveRecord and PostgreSQL bytea fields

I'm posting this because it took me about a day to figure out what was happening, and I couldn't find any writeup of the problem when Googling. Hopefully this'll save someone else some time.

The problem:

Ruby's ActiveRecord does a fair amount of processing on any data you put into a database column; the behavior is data-type dependent, so various escaping and conversion can happen (making sure integer fields are numeric, and so forth). This works great (and transparently) almost all of the time, except when it comes to the BYTEA binary type. When storing BYTEA data, ActiveRecord escapes your data in an asymmetrical way, either using the PostgreSQL C API's PQescapeByteaConn, if it's available, or a pure-Ruby implementation that does the same thing. This happens whether or not you actually call ActiveRecord::Base.save on the ActiveRecord object; it's part of ActiveRecord::Base.write_attribute.

Unfortunately, PQescapeByteaConn and its complement PQunescapeBytea aren't symmetrical (see the documentation for PQunescapeBytea. In particular, backslashes are treated poorly. You can prove this with the following snippet:
>> require 'rubygems'
=> true
>> require 'pg'
=> true
>> str = "\\"; puts PGconn.unescape_bytea(PGconn.escape_bytea(str)) + " = " + str
\\ = \
=> nil
I first discovered this when trying to write data using Marshal, which can handle unexpected double-backslashes (it just treats them as a single one), but doesn't know what to do when a backslash is replaced by an unexpected character. When you Marshal a string, it is prefixed with "\004\010X", where \004 is EOT, \010 is a linefeed, and X is the character at 8+the length of the string. If a string is 84 characters long, X is a backslash (84+8 = 92; 92.chr == "\"). Under most circumstances, this is okay, because as the string gets escaped and unescaped, you just end up with two backslashes, which Marshal deals with.

Unfortunately, if your string starts with three digits, as in "123andthen84charactersxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", the Marshaling process contains the substring "\123", which is treated by PGescapeByteaConn as an octal character, so what comes out during unescaping isn't two backslashes, but rather the octal character "\123". Marshal.restore either returns a truncated string or throws an exception, depending on the details.

The solution:

Well, the easy solution is to use a TEXT field instead of a BYTEA field; the escaping and unescaping won't happen, and backslashes won't be an issue. If you've got more stringent requirements (data other than the output of Marshal), you can either escape your backslashes manually (after Marshaling or what-have-you), or go so far as to encode your string in a backslash-free scheme, like base 64.

What really makes this bug dangerous when using Marshal is that most of the time Marshal will hide the problem by accepting two backslashes in place of one. It's only if you have an 84-character string starting with three digits that you'll see the error. (Actually, 83 is just the first collision; later Marshaled strings also have this problem, starting at some length after 12000.)

Anyway, hope this saves someone the trouble I had.

Monday, September 7, 2009

It's not that I have more free time

Several times in the past couple of weeks, I've been seized with a notion for a blog entry longer than a tweet. I've forgotten most of the topics, but I still want to talk about my new camera and some pictures I've been taking.

Looks like Fairyland - 16
A couple of weeks ago, I picked up a Canon Powershot SD780 IS, which is turning out to be a great decision. It's so tiny I keep it in my pocket next to my phone. (Admittedly, between wallet, phone, camera, and keys, I think I've used up all my available pocket space.) I also ended up springing for the extended warranty, which explicitly covers my own "negligence", so I don't feel too worried trying to take pictures while riding my bike across the UCSF Mission Bay campus in the middle of the night. Someone also finally wrote a piece of software I've been waiting on for a couple of years now that lets me attach locations to pictures (geotag) in Aperture either by browsing around a Google Map or copying over a GPX tracklog from my phone. (A side benefit of using my phone to record my location is that I can make neat maps (w/ photos) of walking trips, as in The Park Less Traveled (map | photos).

Next up, bad Rails/PostgreSQL bugs!

Sunday, May 27, 2007

Horse skeleton?


Horse skeleton?
Originally uploaded by yostinso.

I think there's a horse skeleton in our creek...

Monday, May 7, 2007

Copyright and Bedouins: Part II: Copyright

Defining Copyright
Copyright is, at its most general, the "right to copy", as Wikipedia so eloquently puts it. In the U.S. (and many countries (pdf) that have accepted the Berne Convention), copyright defines a length of time during which the author of a work, be it photography, music, literature, etc., retains the rights to copying and use of that work. Under the Berne Convention, copyright is automatic — as soon as you create something, you own the copyright for some length of time. How long depends on the country; the Berne Convention defines a 50 year minimum (25 years for photography). Additionally, included in restrictions on the original work are restrictions on close derivatives; mimicry that isn't exempted (such as satire) is disallowed under a basic copyright.

The original goal of copyright is to increase the incentive for production of creative works. Without copyright, writing a book becomes a labor of love (or hate, depending upon the author), rather than a means of making money. Copyright prevents someone else from reading a book, copying it, and selling it themselves, with no profit to the original author. However, it's often unreasonable to require that an author be responsible for publication, marketing, and distribution of every book. Copyright can therefore be transferred (in whole or part) to others. A work can also be licensed for use in more specific ways. For instance, a musician can contractually transfer the whole copyright for an album to a record company in return for a guarantee of some amount of money per album sale (royalties). A record company can license a song for use in a movie for some fee; the movie producer doesn't own the copyright, but they're allowed a limited usage of the copyrighted work.

There are some exceptions to copyright. As mentioned above, satire is often considered outside the realm of pure mimicry. There are other examples of "compulsory licenses" that allow some use of copyrighted works. The Berne Convention allows exceptions to copyright where the exception will not interfere "unreasonably" with the desired use by the rights holder. In the U.S., this includes "fair use" and compulsory licensing. Fair use basically follows the Berne three-step test. For instance, non-profit educational use of a clip from a movie that doesn't detract from the revenue of the copyright holder is generally considered fair use. Compulsory licensing, on the other hand, means that if you cannot find the copyright holder for a work, you can just pay a filing fee to the copyright office and use that work until such time as the copyright holder comes forward. This allows some use of orphaned works although it generally only applies to music and television broadcasts. In the U.S., there are also compulsory licenses for webcasting/broadcasting music. The fee (plus the hassle) set by the copyright office determines in large part how much recording companies charge for licensing songs for radio play since stations can either choose to license from the record company or through the copyright office.

Re-defining Copyright
Copyright law, originally designed to protect a creator's ability to make a profit, has been thoroughly co-opted by corporate interests. A key turning point was the extension of copyrights under the Copyright Term Extension Act, a.k.a. the Mickey Mouse Protection Act; it added another 20 years onto the length of a copyright after the author's death to provide a total of 75 or 95 years depending on whether the copyright was personal or corporate. The act was in large part lobbied for by Disney, but there are many that claim it's beneficial. Detractors point out that the only people making money off these longer-term copyrights are those that are wildly successful and generally corporate (i.e. Mickey Mouse).

More recently, the DMCA (Digital Millennium Copyright Act) has been sticking it to fair use, almost nullifying it in some cases. Passed in 1998 at the behest of the movie and recording industries, it's the same sort of knee-jerk failure to cope exhibited when there were strong efforts to ban VCR players as illegal instruments of bypassing copyright. Among other things, it prevents even attempts to circumvent copy protection. For instance, DVDs are encrypted in a way that made it impossible for companies that were not provided the decryption scheme by the recording industry to play movies without actively breaking the law. Some Norwegian kid went and broke the encryption anyway and released the means to do so to the Internet. However, it's illegal (under the DMCA) to use this software even for personal backups, basically preventing fair use copying of DVDs. Even if the copy-protection scheme is trivial — say the movie is recorded backwards — it is still illegal to circumvent it. There's no expiration either; so all DVDs are effectively copyrighted forever, which is in violation of the spirit (if not the letter) of the Constitution, which gives Congress the power "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries."

The DMCA does do one good thing: it absolves internet service providers from responsibility for copyright-infringing actions of their users. They must notify the users in question, and remove the offending material if there is no response, but they are not legally liable for its appearance in the first place. This is accomplished by means of a DMCA Takedown Notice. A particularly interesting case right now involves a demonstration by a law professor (Wendy Seltzer) in which she's posted on YouTube a 30-second clip of an NFL game in an educational, fair-use context. So far, the chronology is thus:
  1. The NFL sent a takedown notice to YouTube (who complied, removing the clip), possibly generated automatically by a program run specifically to detect and send notices for copyright violations.
  2. Wendy Seltzer sent a counternotice (generated automatically, hah!) basically pointing out that it was a fair-use clip. YouTube put the clip back up.
  3. The NFL sent another notice (in probable violation of the DMCA), and YouTube took down the clip again.
  4. Seltzer sent another counternotice, and the clip is back up.
For further exciting details, see Wendy Seltzer's DMCA fiasco archive.

Re-redefining Copyright (Creative Commons)
A standard "All Rights Reserved" copyright is very strict; it literally means that (outside of fair use) the author must be asked permission for every use of their content. Until somewhat recently, the only way to allow wider usage (e.g. for derivations (remixed music), non-commercial, or even complete usage with attribution) was to get a lawyer to write a custom contract or to make an informal and legally sketchy agreement. Creative Commons is an organization that has developed a set of legally thorough contracts that "give you the ability to dictate how others may exercise your copyright rights." (CcWiki FAQ) In effect, Creative Commons is extending copyright to allow easier publishing of material that doesn't need to be completely protected.

A basic set of pre-built licenses exists that can be chosen based on your needs. The core set of Creative Commons licenses all require attribution of the source of a work, but allow you to mix-and-match additional clauses. Non-commercial licenses require that your work or derivations not be used commercially. Share Alike licenses require that any use of your work be shared under the same license as the original. No Derivatives licenses allow use of your work, but only as long as the content is not modified. Additional information about these licenses can be found here.

Creative Commons licenses now apply to a vast amount of work. I mentioned previously that Flickr hosts over 30 million images for use in the public domain; these are being shared under various flavors of Creative Commons licenses. Flickr actually allows you to pick your license type as you upload your photos. Music, video, photos, software, and even books have been licensed for use (and reuse) under Creative Commons.

One especially interesting feature of Creative Commons licenses is that they're machine-readable. This means that a piece of software can easily determine what it's allowed to do with a given work. The example I keep using when trying to explain this is that of a screensaver that pulls pictures from a photo-hosting site like Flickr. Suppose you have a screensaver for sale that shows neato-keen pictures from Flickr and blends them together in interesting ways. Rather than having to individually contact all the Flickr users with pictures you want, you can design your screensaver software to automatically search for only pictures that are licensed for commercial, derivative use. Your piece of software is thus entering a contract with the creators of those pictures, but no human interaction is necessary, either by you, the user of your screensaver, or the creator of the pictures. Thus begins the trend towards software agents negotiating (legally binding) deals, which I'll imagine further in a later post.


Post scriptum:

By waiting long enough to post this article, I've actually lucked into a great example. Creative Commons has recently made the national news in relation to the upcoming US presidential debate. MSNBC hosted the recent debates for the Presidential Primary, but only posted the content online with commercials and an extremely limiting license that, among other things, disallowed "internet use". (See various blog posts.) For the upcoming June debates, both Barack Obama and John Edwards endorsed a proposal that the debates be released under a Creative Commons license. CNN has apparently agreed to do so for all the presidential debates they host.

Friday, April 13, 2007

Dry ice pressure cooker!


Dry ice pressure cooker!
Originally uploaded by yostinso.

Note the double-pipette-tip pressure release valve to ensure maximum bag inflation without letting the bag push itself off the glass.