Farfetched Blog: 2005

Monday, November 28, 2005

Consistent Types

There's a problem with inconsistent downcalling, which I've been hankering after for a bit, and which C# supports using the "new" keyword.

The problem is that extension classes aren't really subtypes of extended classes anymore. They violate the Liskov Substitution Principle.

Interfaces for encapsulation don't have this problem.

(Though it'd be great if all languages had something like c#'s "override" keyword :).)

UPDATE: I was mistaken. Different methods are called on an object in different static "contexts". For example:

((ArtisticCowboy)joe).draw() runs one method, and
((Cowboy)joe).draw() runs another.

This is what C# does, but I don't know if it's wise.

Thursday, November 17, 2005

Future Refactoring UI

Refactoring tools have a good case of programmer automorphism. These tools present silly lists of operations like "decompose conditional" and "encapsulate downcast."

Really, automatic refactoring means the computer automatically makes all appropriate changes to accompany your change. It's all about moving and renaming. The tools should get rid of their complicated interfaces, and just support a mode "don't let me introduce any errors."

Simple changes are easy to spot, but moves need to be done with Cut&Paste. For example, cutting from one place and pasting to another should fix up the pasted code to work in the new location. It should also fix up callers to the code at the original location so that they now call to the new location. The user shouldn't have to specify what type of code is moving, or what type of move it's making.

If this could be made fast enough, programmers might want to always work this way. They'd just have to get in the habit of defining before using, and deleting usages before deleting definitions.

Wednesday, November 16, 2005

Everyone is Like Me, and bad UI

User interfaces suck because programmers think users care about programs.

Users do not care about programs. They think, "Just do it," and, "Get out of my face." Programmers love their programs, so they think everyone else does too. For example, they even write programs that hassle users to make implementation decisions.

In general, every person thinks that everyone is just like himself. Worse, every person thinks that everyone is like himself right now. In the linked example, even the developers won't care about the implementation a year after they've coded.

There is a word for this, though it seems seldom used: automorphism

Sunday, October 30, 2005

Interfaces for Encapsulation

The python people haven't succinctly explained why python doesn't need "private", "protected", and "public". In one sentence, python doesn't need visibility modifiers becase...

With sufficiently good refactoring support, encapsulation decisions don't affect code evolution.

Especially according to the XP design principle of collective code ownership, it's easy to change callers of an interface when the interface changes. Though people reading the code will need to understand the changes, that's far more efficient than making artificial barriers to reuse. Encapsulation is powerful; forced encapsulation is onerous.

This is true for a single organization, no matter how large its code base. Once code is "published" however, all bets are off. Code obviously can't be refactored across administrative boundaries. That being the case, we should care about "published" instead of "public".

For encapsulation of published code, separate interfaces are much more useful than visibility keywords. A class with a single public method is better represented by a separate interface containing a single method. When that interface is published, it never changes. In Java for example, the old interface could be kept in a separate jar file, which clients can continue using even after extended versions become available.

More originally, we should allow extending code through an interface. Extending through an interface means that new code will never downcall to a method that isn't in the interface. This is an elegant solution to the fragile base class problem. (Someone recently showed me the cute new c# modifiers "new" and "override". They protect against accidental downcalls, but they don't protect against replacment of methods called by collaborating classes.)

Extending through an interface might also help with downcalling into unrelated methods with mixins and multiple inheritance. For example, to create an "artistic cowboy" with multiple inheritance, we'll inherit from "artist" and "cowboy". Though we want his artistic nature to be dominant, we don't want the cowboy method "drawAndShoot()" to downcall into the artist method "draw()". We can achieve this by extending artist through an interface that excludes "draw()".

Monday, October 10, 2005

DRM and the Little Guy

Maybe published work isn't best treated like property, but publishers will certainly fight to keep it that way. Physical media have been pretty good at proxying ownership of information. If general purpose computers can't do as well, then publishers will just adopt special purpose devices instead. Besides keeping consumption cumbersome and inefficient, this'll give binding power to corporations, but not to individuals.

A general purpose DRM system can have a significant impact on society. If we're going to allow our computers to enter into contracts for us, we need some safeguards to avoid creating an information dystopia. The restrictions that we allow publishers to place on consumers should themselves be restricted. Generally, it shouldn't be possible for publishers to artbitrarily revoke access to their work. The set of digitally managed "rights" should be standard, easy to understand, and when possible, it should not require an ongoing relationship with the publisher.

This task is more difficult from the technology standpoint, and it'll detract from the generality of the full DRM dream, but it could work.

Friday, October 07, 2005

Money and Politics

So long as the only means of engaging in political dialogue is through purchasing expensive television advertising, money will continue by one means or another to dominate American politics.
--Al Gore

Templates for Environments

(The more surprising points are below.) Templates are useful for maintaining different deploy environments. Any large system needs to have a painless way to run in different environments, at least one "production" environment and one non-production environment. Examples of things that change between environments are: machine names, port numbers, database names.

Manually switching code from one environment to another is tedious and error-prone, but not all tools can make it completely automatic in all cases (for example, enabling stored procedures to run against differently named databases). Theoretically, templates are also be useful for unifying environment configuration across languages, but they're really essential for languages that don't support the necessary abstraction.

Even on platforms that have the necessary support, environment management can get gnarly. For example, java has a great configuration mechanism in the form of "properties" and the improved "preferences". If different sets of property files are used, however, it's easy for the non-production property sets to get out-of-date. The solution is to ensure that only a small fraction of properties are environment specific, and put only those in a separate file.

Templates generally break tools for live editing. For example, you can't "alter table" in a test database and easily have that change apply to your template. Or use a gui wizard to maintain templatized xml files for your application servers. We'd like to be able to "round-trip" templates, to be able to automatically generate templates from live files, as well as generate live files from templates. For this to be easy, the templates must meet two requirements:

there must be a one-to-one correspondence between template variables and the values in the environment used to generate templates.
template values must be unique.

An exampe of #1 is that a test environment must have as many different machine names as there are in production, even if those machine names are just aliases. Otherwise, the test environment will lose information.

An example of #2 is that a machine can't be named "foo" because substituting a template variable like "@FOO_MACHINE@" will also substitute "buffoon" and "foot".

Otherwise, the template generation will have to get complicated.

Tuesday, August 09, 2005

Google Maps

Google maps should support saving and restoring locations.

It should also support taking the union of the sets of locations created by different API sites.

Wednesday, August 03, 2005

Mixin Requirements

Proper mixins require two funny features:

never down-calling into unrelated mixins
a mechanism for disambiguating constructor parameters

Example 1:

class artist:
def draw(self): ...
class cowboy:
def draw(self): ...
def drawAndShoot(): self.draw() ...
artisticCowboy = mixin(artist, cowboy)()

artisticCowboy.draw() should call artist.draw,
but artisticCowboy.drawAndShoot() should call cowboy.draw,
without needing to be explicitly specified

Example 2:

class file:
def __init__(self, root): ...
class plant:
def __init__(self, root): ...

we need some mechanism for specifying plant's and file's constructor parameters independently

UPDATE: Silly me, number one would be great for multiple inheritance also.

Tuesday, August 02, 2005

Improving the GUI Editor 2

A standard XML format for representing GUIs would be a tremendous boon.

It would solve the biggest problems with current GUI builders: round-trip-ability and lockin.

It would enable a single builder to easily support GUIs that run with different languages and widget sets, including HTML.

The standard should specify the least common denominator, as well as extra features and their fall-backs. It shouldn't specify bindings or fancy template support, but it should specify how builders maintain information for features that they don't handle.

Monday, July 25, 2005

Mixins instead of Traits

On reflection (no pun intended), I can't see what advantage Traits have over mixins in a dynamic language like python. Specifically, python doesn't seem to be stuck with a total ordering.


import copy
def mixin(*classes):
  c=copy.copy(classes[0])
  if len(classes) > 1:
    c.__bases__ += classes[1:]
  return c()

class a:
 def f(self): return "a.f"
 def g(self): return "a.g"
class b:
 def f(self): return "b.f"
 def g(self): return "b.g"
class c:
 def g(self): return a.g(self)

obj=mixin(c, b, a)
assert(obj.f() == "b.f")
assert(obj.g() == "a.g")

Friday, July 22, 2005

Improve the GUI Editor

Beware the GUI Editor only for a little while. For one thing, Sun is finally fixing wysiwyg layout with GroupLayout and assorted swing changes, and demonstrated in netbeans. For most of Beware's other complaints, all we really need is good template support, just like what we've been using with html for a while.

Just like with html, users usually won't care about the api for manipulating their presentation, but that's OK. (The favored api for html is the javascript dom.)

Monday, July 18, 2005

Python Shell

In the quest for best re-use, I'd like my "shell" programming language to be my main programming language, and to just support some syntactic sugar and extra functionality for interactive use.

Windowing systems obsolesce job control, leaving tab completion, history, and prompting as the important shell functionality. Python's readline bindings can handle the first two.

LazyPython and IPython both allow ommitting parantheses, commas, and quotes around function parameters. They use standard python's sys.excepthook, so that they only extend python syntax that would've illegal anyway.

This has two problems. First, there are cases that don't work, like calling a method without parameters. Second and more importantly, excepthook still throws an exception, and so discards context. That means you can't do:for i in range(n): some_convenient_method i.

So in order to do it right, we'll have to hack the interpreter, and while we're at it, we could even support working directly with the input. For example, you type some_convenient_method and then a space, and the interpreter inserts a paranthesis. You always have legal python code. IPython prints the legal python code after you hit return, which isn't as nice.

Incidentally, until we have interfaces or inference, tab completion can only work with already instantiated objects. (For example you can't complete list().a to list().append.)

Wednesday, July 13, 2005

Electronic Paper

Should be on my link blog, but wow, Fujitsu is demonstrating electronic paper!

Highlights: as easy-to-read as paper, extremely low-power, flexible, color. Product to be available between one and two years (April 2006 to March 2007).

Friday, June 24, 2005

Is It You?

Rule of thumb:
if you have a bad experience with someone, it's just him.
if you have a bad experience with everyone, it's you.

Tuesday, June 21, 2005

Oppressive Corporate Firewalls

If you're behind an obnoxious corporate firewall, that doesn't allow any internet access except through a web proxy, you can still get out to your unix account.

Putty supports SSL tunnelling via SSL out-of-the-box:

on your server, add "Port 443" to your sshd_config
in putty new session, specify 443 in the port box, and
in its Connection/Proxy category, check HTTP and specify your proxy info

Monday, June 20, 2005

Perl Best Practices (for corporate coders)

In general, use perl for glue code and reports, not for business logic.

Perl code should be as simple as possible. That means:

Use as few language features as possible.
Use as few lines as possible, as long as they're readable.
Don't overdesign.
Don't try to write perl that looks like java.

Avoid rewriting functionality that is present in the standard library.

Separate functionality out into multiple small scripts, the smallest meaningful rerunnable units. For example, have one script handle generating a report and another script handle distributing it.

Always "use strict".

Avoid using $_ in loops except for regular expressions.

Avoid reversed flow control operators except for error handling ("doSomething or die()").

Avoid redundant comments.

Avoid overloading on context. (e.g. sub fu { wantarray() ? X() : Y() })

Avoid special symbol variables. Use the long versions and "use English" (e.g. "$INPUT_RECORD_SEPARATOR" instead of $/). Comment appropriately.

Avoid using shell tools (e.g "awk", "grep", and even "date" and "cp"). If perl can do it, do it in perl.

Prefer "my" over "local".

Put "my" declarations in the tightest possible scope.

Have users of modules explicitly import the tokens that they want (e.g. "use SomeModule qw( SomeFunc $SomVar )").

Avoid writing huge modules with lots of independent functionality; people are afraid to change them.

Avoid writing modules altogether, unless you're absolutely sure that your code will be reused.

If you're using references, have the first character of the expression reflect the type of the expression (e.g. use "$somearrayref->[3]" instead of "@$somearrayref[3]").

If your "configuration" is sufficiently complex (or unchanging) just define at the top of your script.

Wednesday, June 08, 2005

Next Generation Web

The next generation web will be all about abstractions for handling foreign code.

Of all the desirable features for a development platform, one of the most desirable is the ability to write and deploy code quickly and easily. This is how people explain the success of the web and web applications like the google suite.

The next generation web will protect users even from malicious code without requiring much greater responsibility. The new security will have to support three classes of features:

manage state
allow the code from one site to affect the user's experience of other sites (like greasemonkey for example)
clearly determine what site is responsible for a given interaction (to combat phishing for example)

Users will never be hassled "are you sure you want to install this extension?"; foreign code will have access to a powerful set of operations that yet poses no threat to the user. On the other hand, users will have to learn something of a new approach just to be able to understand what they'll be seeing.

Thursday, May 26, 2005

Another Solution to the Fragile Base Class Problem

Here's a simple python implementation of the solution to the fragile base class problem described in Modular Reasoning in the Presence of Subtyping, and more accessibly in John Cowan's blog. It uses classes in place of "divisions".

class sgd(type):
  def __init__(cls, name, bases, dct):
      overridden_bases = [ base for attr in dct.keys() for base in cls.__mro__ if hasattr(base, attr) and not attr.startswith('__') ]
      required_methods = [ (base,m) for base in overridden_bases for m in base.__dict__ if m not in cls.__dict__ and not m.startswith('__') ]
      if required_methods:
          raise "unimplemented methods:", required_methods

      super(sgd, cls).__init__(name, bases, dct)

class sgd_class(object):
  __metaclass__ = sgd

and then:

>>> class a(sgd_class):
...  def f(self): pass
...  def g(self): pass
... 
>>> class b(a): pass
... 
>>> class c(a):
...  def f(self): pass
...  def g(self): pass
... 
>>> class d(a):
...  def f(self): pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "x.py", line 12, in __init__
    if required_methods: raise "unimplemented methods:", required_methods
unimplemented methods:: [(<class '__main__.a'>, 'g')]
>>>

Monday, May 16, 2005

People Don't Read, and What To Do About It

Ever had someone flagrantly not read your email? Try putting the most important stuff at the top.

Not reading is a lot more prevalent than you think. Most people don't enjoy reading. Even those who do have too little time to read everything they'd like.

The solution is to write your emails in order of decreasing importance. Don't write in chronological order, or even in logical order. Employ the journalistic technique of "heads, decks, and leads" or headlines, sub-headlines, and leading paragraphs.

This is most important for messages to groups of people, and for documentation messages. In other situations, there isn't so much motivation to write more text than will be read.

(This is ironic as a blog post; those who see this post presumably do read. But this advice is for you! You don't realize how unusual you are.)

Friday, May 13, 2005

Exceptions Are Good For You

Languages with exceptions are good for you; they make you realize that your code can be interrupted any time.

Consider:

#read user password without displaying on screen os.system("stty -echo") print "Password:", password = raw_input() os.system("stty echo") print

What happens if the user changes her mind, and kills the program? All subsequent typing in the shell will be invisible! Checking for errors won't help here.

The only difference in modern languages is that the rest of a program may continue to run even after a function has been interrupted. This creates a new kind of obligation, but not a big one:

you already have to worry about external state without exceptions (above example)
you don't have to worry about local function state, because the exception causes it to be thrown out
you do now have to worry about non-local state

Composable memory transactions would relieve us of even #3; an exception would roll back all the changes.

An effects system would at least catch the omission; it'd flag a function that makes multiple write calls to objects not mentioned in a finally cleanup block.

In languages without exceptions, you have to use a signal handler in the above example; in languages with exceptions, you can use a finally clause for the stty echo.

(In the spirit of the recent exception buzz (Joel Spolsky's popularization of Raymond Chen's rant, and GvR's "resource allocation" work)

Friday, May 06, 2005

Click For Each Page: ACM Queue

Some sites have an obnoxious policy of requiring you to page through their content by clicking "next", "next" a bunch of times. Presumably they do this to better gauge interest in the content; the user cared enough to click to the end of the article. (Web browsers don't naturally support reporting back to the server how far the user scrolls in a page or how long a user has a page open (and focused).) Or they could just be incompetent.

To get around this, you can often click their button "Print This Article" or equivalent.

For some sites, you may have to visit the last section and then click "Print This Article" there. Cute, eh? This is how I read ACM Queue.

Capacity For Self Deception

Someone should start a broad catalog of memes, not just pop-culture memes. Google book search shows our "capacity for self deception" going back to the 19th century.

Though it's easier to demonstrate this for criminals, it also applies to the best of us.

Friday, April 29, 2005

We Need Core Data

Something like Apple's new Core Data data-model-framework is long overdue, but we need a standard, free, cross-platform implementation.

The hierarchical filesystem is not meeting our needs. For all of our documents, we want:

easy persistence
undo and redo
gui prototyping
revision control
transactions
search
annotation

(Core Data only provides the first few features.)

This new API can't handle behavior and stay language/platform independent. That is, it can't support the object oriented approach of only allowing access to state through member functions.

On the other hand, it would be nice to use regular oop languages to specify the model.

Thursday, April 21, 2005

Justification of Python Sequence Subscripts

Which notation should be used for slicing a subsequence of m through n items?

myseq[ m : n ]
myseq[ m : n+1 ]
myseq[ m-1 : n ]
myseq[ m-1 : n+1 ]

#3 and #4 are silly, because they're inconsistent with the natural way to refer to a single item myseq[m].

Perl uses #1, which is wrong because you can't iterate down to an empty subsequence with myseq[m:m]. Representing this with myseq[m:m-1] is clearly evil, and it breaks completely for m = 0.

The rationale for indices starting from 0 instead of 1 isn't as strong: that representing the whole list as myseq[0:len(myseq)] is "nicer" than representing it as myseq[1:len(myseq)+1]. Perl doesn't even have this rationale :).

Ironically, reverse indices in python do go from 1 to n+1, or rather from -1 to -(n+1). This is fortuitous, because representing reversed(myseq) with myseq[-1::-1] is nicer than representing it with myseq[0::-1].

inspired by a note from Dykstra

Wednesday, April 20, 2005

Type, Effect, and Inference

Effect systems are nice for the same reasons that Type systems are nice, and they're also a big win for concurrency and for encapsulation. Effects, as I understand them, allow specifying how code reads and writes state.

If you're wondering how to convince people to create even more code annotation, don't worry; we'll be able to infer effect information from the code itself, just as we can infer type information from the code itself.

Even with perfect inference, explicit annotation will be useful for external data and for specifying contracts.

Thursday, April 07, 2005

RDF and the WWW

Just had an epiphany: RDF is very similar to the world wide web.

They're both edge-labelled directed graphs.

Though people generally aren't so careful with their link texts, these are just like RDF predicates. The linking page is the subject, and the linked page is the object.

Tomboy and the Future

Tomboy is an excellent program for linux.

It looks like a simple implementation of post-it notes, but it has a few features which make it wonderful:

automatic save
ability to easily link any text to a new note (sort of like a wiki)
fast full search
clickable web links

There are more features that'd yield incredible power:

"who links to me?" button
ability to list nodes that are linked to by all of a set of nodes
ability to open a node by path, with autocompletion
"publish" button, for a single node or for a group of nodes
ability to manage the graph graphically

#2 would enable the now fashionable folksonomy functionality.
#3 would replace bookmarks.
#4 would implement a blog authoring tool.
#5 would give you a mindmap.

This stuff shouldn't be in a particular application, but should be available to the whole system. It'll be great if this notes application demonstrates of the power of the approach, and motivates people to implement it more widely.

Monday, April 04, 2005

Corporate Personhood

Here's my attempt to distill the issue of corporate personhood. The best book on the subject that I've seen is The Corporation : The Pathological Pursuit of Profit and Power, by Joel Bakan.

Corporations are disproportionately powerful, because they concentrate huge amounts of money and therefore political influence. Is this good or bad? On the one hand, "what's good for GE is good for America." On the other hand, corporations by law must pursue no objective over maximizing shareholder value. This means externalizing costs whenever possible. For example, a corporation sells 100,000,000 widgets whose use will kill 100 people. That corporation may not legally settle for a smaller profit in order to make its widgets safer, and thereby save the 100 people. (The "what's good for GE" proponents may counter that the public sector makes such cold-hearted calculations all the time, and that the economic activity of the widgets is part of the "rising tide that lifts all boats," making the whole society wealthier and safer.)

What does this have to do with "corporate personhood?" Corporations concentrate huge amounts of capital, but they defuse almost all responsibility. When a corporation makes a million dollars, that money is shared by all its owners. When a corporation kills a person, that guilt is claimed by no one.

Though a corporation is fundamentally no more than a financial partnership, the existence of the corporate entity insulates its officers from morality and from the law. A bad action could spring from the combined actions of many members of a corporation, though none of the individual actions is bad. Even when a single member performs a bad action on behalf of the corporation, it's difficult to detect and prosecute. This insulation itself encourages bad behavior, because members' responsibility to the corporation is more pressing than their responsibility to the society and its laws.

Basing the rights of the corporation on the rights of the person gives corporations additional power. The law gives individuals various rights to protect them from the power of the state (e.g. the Bill of Rights). Sharing these rights with corporations gives them even more power. Treating people and corporations the same allows using a rhetoric of "freedom" to weaken and eliminate regulation, when regulation is the best tool for making corporations consider important externalities, and the only way to correct market failures.

Friday, April 01, 2005

Emulating maxdepth for Solaris find

Say you want to find files in the top level of a directory that meet a certain condition (e.g. haven't been accessed for at least three days).

GNU find can do this with "-maxdepth 1", but Solaris find can't. To get the same functionality for Solaris find, you can use this hack:
find /some/path/. $ -type d -a \! -name . -prune $ -o -type f -print

That is, first prune all directories that aren't named ".", then print the names of the remaining files. It's important that the path be "/some/path/." instead of "/some/path", because that's what enables special treatment for the top level directory.

Thursday, March 31, 2005

Partial Replacing Lambda?

Wonder if partial functions are going to replace lambda in python3000.

partial(operator.add, 1) isn't such an attractive substitute for lambda x: x+1, but that's more about operator than it is about partial().

Maybe partial()+1 should be supported using operator overloading. It'd be especially nice to be able to do partial()+x+y instead of partial(operator.add, x+y)

This form should probably be a differently named built-in, but built-ins aren't cheap, and the no argument constructor isn't otherwise meangingful.

Wednesday, March 23, 2005

Humble and Smart

The more you learn about something, the more you realize how little you know about it.

This isn't just philosophy; when people know nothing about a thing, they assume that there's "nothing to it". Then they confidently make decisions based on incomplete information. (It could be because interesting elements exploit the interplay of the simple and the complex. Or it could just be because small details aren't visible at a distance.)

Paradoxically, bright people are more susceptible to this thinking. Bright people are accustomed to quickly understanding ideas outside of their field of expertise, and this understanding is often superior to experts'. A bright person can start to believe (subconsciously) that much of the work in a field is characterized by lack of understanding.

This has ramifications for religious leadership. That religious leaders have insight into issues not technically within the sphere of religious practice is a doctrine of Judaism, and probably other religions as well. It's only logical; following Judaism is about living wisely, and someone with a deeper understanding of Judaism should have attained greater wisdom. In order to be truly wise, however, a person must be aware of his limitations. Maybe this awareness can be achieved by encountering sufficiently dramatic examples (for example the abuse cases involving religious leaders in recent years), or maybe it requires learning a little bit about alot of different things.

(It is troubling to think that G-d confers special wisdom in a way that is completely independent of the natural working of the world. For one thing, thinking that you've got supernatural aid from G-d certainly exacerbates the problem described above. For another, there is no good way to tell when you got it, and when you don't. "Trust in God, but steer away from the rocks", applies here as well.

At any rate, I don't think Judaism encourages completely ignoring the rules of the physical world. There's a gemara in niddah that gives advice about achieving various goals; in each case the gemara advises both acting according to natural way of the world, and also praying for success. In this era characterized by hester panim ("G-d hiding His face"), we certainly shouldn't abdicate the responsibility to do our hishtadlus (effort), especially where other people are involved.)

Friday, March 18, 2005

Idiomatic Python

Programming languages, like human languages, each have their own idiomatic usage. Though there are many different ways to express a single thing, some patterns of expression just feel much more natural.

The idiomatic style of python is beautiful.

Here's an example. You'd like to allow categorizing emails using their subjects, by prefixing the normal subject with the category and a colon. So for a message with subject "game on tuesday", you could use subject "sports: game on tuesday". If no category is specified, you'd like to use "misc". Here's the Pythonic way to do it:

try:
       category, subject = headers['Subject'].split(':')
except:
       category, subject = 'misc', headers['Subject']

There are a few different features at work here:

"It's easier to ask forgiveness than permission"
Exceptions are used heavily
Language support for collections is simple and easy

Friday, March 11, 2005

"Startups" and Small Businesses

At least since the dot-com bubble, the idea of the startup has been enshrouded in glamor. Most recently Paul Graham posted How to Start a Startup, but his advice is for dot-com-bubble type startups. He suggests that starting a company means compressing a career of 40 years into 4, and assumes that every new business will either flop or become an industry giant.

It could be that when Graham says "startup", he means a company that would rather flop than settle for staying small. Only talking about these startups is strange for a few reasons. Many companies consider themselves successful without making a public stock offering, or being bought out for piles of cash. How about creating a successful small company first, on the way to becoming an industry giant later? And which ideas and strategies are appropriate for which goal? These questions are a lot more relevant today than how to handle a Venture Capitalist imposed CEO.

After all, though more people are employed by big companies, there are many more small companies than big ones.

Wednesday, March 09, 2005

Note to Future Self

Whenever I put something down in an unusual place, there's a strong chance I will "lose" it.

Hopefully when I'm older I'll look back at this note and realize that I'm not going senile.

Tuesday, March 08, 2005

Ipod Shuffle for Lectures

The IPod Shuffle is a wonderful device. One more feature would make it perfect for listening to lecture series (or books on tape or language courses), without bloating its wonderfully simple interface.

The IPod Shuffle should remember the current "song" after it's turned off.

So apparently this is supported for purchases from ITunes and Audible. In order to get it to work with anything else, you have to use faac to create .m4b files.

UPDATE: It works great. If you're using linux, get gnupod, mpg321, and faac. Then you can do something like:

mkdir /mnt/ipod 2>/dev/null
mount /dev/sda1 /mnt/ipod
for mp3 in *.mp3
do
  m4b=${mp3%.*}.m4b
  mpg321 -w $mp3 | faac -b 10 -c 3500 - -o $m4b
  gnupod_addsong.pl -m /mnt/ipod $m4b
done
mktunes.pl -m /mnt/ipod
umount /mnt/ipod

Incidentally, it would be great if gnupod would expose a filesystem-type api to the ipod using fuse.

UPDATE#2: Don't use gnupod; use the iPod shuffle Database Builder.

Thursday, March 03, 2005

Warn of Unsaved Changes Javascript

OK, this will get all the javascript out of my system. This one is intensely useful for web "applications". It warns the user if she tries to leave the form with unsubmitted changes, whether leaving by moving to a different page or by closing the window. It requires a recent firefox or ie browser.

<body onLoad="lookForChanges()" onBeforeUnload="return warnOfUnsavedChanges()">
<form>
<select name=a multiple>
 <option value=1>1
 <option value=2>2
 <option value=3>3
</select>
<input name=b value=123>
<input type=submit>
</form>

<script>
var changed = 0;
function recordChange() {
 changed = 1;
}
function recordChangeIfChangeKey(myevent) {
 if (myevent.which && !myevent.ctrlKey && !myevent.ctrlKey)
  recordChange(myevent);
}
function ignoreChange() {
 changed = 0;
}
function lookForChanges() {
 var origfunc;
 for (i = 0; i < document.forms.length; i++) {
  for (j = 0; j < document.forms[i].elements.length; j++) {
   var formField=document.forms[i].elements[j];
   var formFieldType=formField.type.toLowerCase();
   if (formFieldType == 'checkbox' || formFieldType == 'radio') {
    addHandler(formField, 'click', recordChange);
   } else if (formFieldType == 'text' || formFieldType == 'textarea') {
    if (formField.attachEvent) {
     addHandler(formField, 'keypress', recordChange);
    } else {
     addHandler(formField, 'keypress', recordChangeIfChangeKey);
    }
   } else if (formFieldType == 'select-multiple' || formFieldType == 'select-one') {
    addHandler(formField, 'change', recordChange);
   }
  }
  addHandler(document.forms[i], 'submit', ignoreChange);
 }
}
function warnOfUnsavedChanges() {
 if (changed) {
  if ("event" in window) //ie
   event.returnValue = 'You have unsaved changes on this page, which will be discarded if you leave now. Click "Cancel" in order to save them first.';
  else //netscape
   return false;
 }
}
function addHandler(target, eventName, handler) {
 if (target.attachEvent) {
  target.attachEvent('on'+eventName, handler);
 } else {
  target.addEventListener(eventName, handler, false);
 }
}
</script>

Wednesday, March 02, 2005

Telephone Shell Javascript Widget

Here's some javascript I wrote to coerce uniformly styled phone numbers. If you don't use any punctuation, it defaults to american style. You can enter unrestricted text after the number. You should be able to copy the following into an html file to try it out.

<form>
 phone #<input onkeypress="return phone_only(event)">
</form>

<script>
function phone_only(myevent) {
        mykey = myevent.keyCode || myevent.which; //ie||netscape
        myfield = myevent.srcElement || myevent.target; //ie||netscape
        if (mykey == 8) //backspace (netscape only)
                return true;
        f = myfield.value;
        g = myfield.value;
        ndigits = f.replace(/-/,'').length;
        ngroupdigits = g.replace(/.*-/,'').length;
        if (ndigits == 0) {
                if (50 <= mykey && mykey <= 57) { //2-9, can't start with 0 or 1
                        return true;
                } else {
                        return false;
                }
        } else if (ndigits <= 7) { //only need 2 hyphens: 123-456-7
                if (32 <= mykey && mykey <= 47 && ngroupdigits != 0) { //punctuation
                        myfield.value += "-";
                        return false;
                } else if (48 <= mykey && mykey <= 57) { //0-9
                        if ((ngroupdigits % 4) == 3) {
                                myfield.value += "-";
                        }
                        return true;
                } else {
                        return false;
                }
        } else {
                return true;
        }
}
</script>

Tuesday, March 01, 2005

Forms Support in Javascript, not Template

Turns out it's pretty simple to populate forms generically in javascript. This means that your template engine doesn't need special html forms support; you can just use a pair of templating "for" loops to create the javascript data structure. You could even grab the data using xmlhttprequest. Html made the the mess, so html can clean it up.

Here's an example form, followed by the code. You should be able to copy into a file and try it.

<form>
<input name=a>
<input name=b type=checkbox value=x1>
<input name=b type=checkbox value=y1>
<input name=b type=checkbox value=z1>
<select name=c multiple>
 <option value=x2>X2
 <option value=y2>Y2
 <option value=z2>Z2
</select>
<button onClick="populateForm(this.form, {'a':'x0',b:['x1','z1'],'c':['x2','z2']}); return false">Populate</button>
<input type=reset>
</form>

<script>
function populateForm(myForm,myHash) {
     for (var k in myForm) {
             if (!(k in myForm) || !(k in myHash)) continue;
             if (typeof myHash[k] == 'string')
                     myHash[k] = [myHash[k]];
             if (myForm[k].type == 'text') {
                     myForm[k].value = myHash[k][0];
             } else {
                     var field = 'type' in myForm[k] && myForm[k].type.match('^select') ? 'selected' : 'checked';
                     var selected = Array();
                     for (var i=0; i<myHash[k].length; i++)
                             selected[myHash[k][i]] = 1;
                     for (var i=0; i<myForm[k].length; i++)
                             myForm[k][i][field] = myForm[k][i].value in selected;
             }
     }
}
</script>

Tuesday, February 22, 2005

Even More On Presentation Done Right

I changed my mind back; StringTemplate isn't the way to go.

Templates are neither necessary nor sufficient for separating model and view (presentation). Not necessary because the separation is really accomplished by the model/view protocol, and not sufficient because presentation code can always seep into the model code. In fact, the StringTemplate approach requires complex views to be implemented in the model code.

A great m/v separation may be achieved by just having separate code for model and for view, even in the same language. The only requirements are on the data passed from the model to the view:

it should be available before the view code runs
it should consist only of lists of lists, and lists of opaque objects

Flagrant violations are possible (like the view code writing to a database), but should be easy to spot. The view code isn't artificially limited; it can implement fancy animations or fractal decorations, for example.

It's probably worth noting explicitly that the view depends on the model and not vice versa.

Recent javascript interfaces from google seem to contradict requirement number one, but really they just use a lazy loaded model. The view treats the model as if it's present, but because of the model's prohibitive size, each piece is loaded only just before it's needed.

The StringTemplates author acknowledges that his solution is insufficient to keep view logic out of the model code. The model code could pass data like "the color red" to the view layer. Since StringTemplates is such a simple language, more complex view features must be implemented with the help of model code. Any possible document may be generated, but only with the help of the model code.

So if m/v separation isn't the point of templates, what is? Separation of roles. The roles of programmer and designer aren't necessarily performed by different people, but they do require different approaches. The designer role includes everything that can be done in a WYSIWYG editor, with all visible elements, "includes", and conditionals. As document formats evolve, this could become part of the normative editor functionality. CSS visibility could be hacked to support conditionals, and XInclude takes care of the rest.

Sunday, February 20, 2005

Google Maps and Free Source

If you haven't already, check out Google Maps; it's extremely cool. Besides being a generally superior implementation of maps, it loads new maps without reloading the page in your browser, just like gmail. After you view a map, its pieces are cached by your browser. This means they're fast, and can be viewed even without an internet connection.

A yet un-remarked-upon feature of these "network applications" is how similar they are to free source. Though the intellectual property restrictions of google and of free source are just as strict as those of the most proprietary software vendor, the implementations of both google maps and GPL'ed code are available for anyone to poke at. GPL developers have so much faith in their model that they don't worry about commercial competitors, and google developers apparently have so much confidence in their own skills that they don't worry.

You already have free access to both the google maps client software and the map data. Without too much difficulty, you could extend either. For example, you could use the google client to view your own map data, you could run your own software on the map data (which is already cached on your pc), and people are already hacking new features onto the whole google maps solution.

The major change in these new applications is that the interface is exposed. With regular google search, you either have to use the Google Web APIs or else screenscrape the html pages. With a little reverse engineering of the new applications, you can access the same interface that's used as part of their normal functioning.

(The Google Web APIs limit developers to 1000 queries a day, and presumably a higher volume of screenscraping or access to the new APIs would get shut down by Google.)

The Internet Changing Everything

Traditional categories of media become irrelevant
Does a given piece of information belong in a book, an academic paper, a weekly magazine article, an editorial, a letter to the editor? The internet makes the distinctions increasingly irrelevant.
The description of an idea may be canonicalized
A web page is an expression of an idea that is universally available (at least theoretically). It's easy to refer to, and easy to maintain. The internet will one day provide a distributed wikipedia, in which every idea has a URI and page(s) containing all public human knowledge of that idea.
The context of an idea may be completely specified

Traditional bibliographies are klunky in comparison. It will be possible to link a given piece of information with every other piece of information on which it relies. That metadata will itself be data, subject to easy analysis.

Apologies for early outline version.

Thursday, February 17, 2005

URL Features

Computers should record all operations on URLs. For example, I should have a list of URLs that I've emailed (along with the recipient and date), that I've instant-messaged, or printed, or transferred, or mentioned in any document.

"URL" means both local and remote. I should have a full history of every file I've opened (explicitly), even if it wasn't using the browser. Another operation that I want recorded is when I change files, and when I post forms.

I should get a smarter handling of "cached" and downloaded files. Browsers already have caches, so why I should have to manually make copies of files for speed or reliable access?

Thursday, February 10, 2005

Perl Problems

1 Obscure Features

The core language has so many features that even the greatest perl wizard does not know them all. Find a wizard and ask him how many of the following features from the core language he understands (without consulting the manual).

 split(' ') vs split(/ /) vs split('') vs split(//)

```
 $x{'a', 'b'} vs @x{'a', 'b'}
```
```
 $[ = 3
```
```
 [{foo => 1, bar => 2}, "FOO", "BAR"]
```

And here's a realistic example of idiomatic perl:

    sub with_gs_acct_check_digit{"$_[0]-".substr(10-(map$_[2]+=$_,map/\d/?$_[1]++%2?split'',$_*2:$_:0,split'',$_[0])[-1]%10,-1)}

Since these features usually address a real need, they can be very tempting to use. Even if someone were to do the work to specify a sane perl subset, there is no tool to enforce its use. Often a perl expert isn't aware that the language features that he uses are obscure or difficult for the nonexpert to understand.

Perl's fast feature accretion is a direct result of its core design philosophy, "There's More Than One Way To Do It".

2 Few Errors are Caught by the Compiler

Since every new feature is encoded using a fixed set of operators, very few errors actually cause a compiler error.

    print "Strings are equal\n" if "abba" == "baba";

2.1 Context Confusion

Every perl expression has several entirely different results depending on its "context", the code in which the expression occurs. The following example will trip up even experienced perl coders:

    sub myPrint($) {
        my $arg = @_;
        print "the first argument is $arg ?\n";
    }

3 Unsafe Features

Many commonly used features are unsafe. The "use strict" directive only catches the most blatant.

3.1 $_

For example, one of perl's major conveniences is the $_ short-hand for referring to the current index of a loop. The following is great way to print upper-cased versions of some strings.

    @t = qw( ibm dell msft );
    for (@t) {
        print uc "$_\n"
    }

However, if you want to instead use a different function in place of "uc", this code may fail in mysterious ways. Functions called in the body of the loop can alter what the $_ reference refers to, and neither the caller nor the callee may realize the danger.

3.2 Exceptions

Exception support is inconsistent and dangerous. When most function fail, they just return false and sometimes an error code. Other functions expect to be called in "eval blocks", and will otherwise cause the entire program to terminate (e.g. mkpath from File::Path)

3.3 Auto-vivification

Another example is auto-vivification, or the on-the-fly creation of datastructures. For the following data structure:

    use strict;
    my %myhash1=( "a"=>1, "b"=>2 );
    my %myhash2 =( "myhash1"=>\%myhash1, "c"=>3, "d"=>4 );

Perl will properly catch the error:

    $myhash3{c}++;            # %myhash3 is not defined

But will happily allow:

    $myhash2{myhash3}{c}++;   # a new hash will be created!

4 Data Structures

Nested data structures in perl are unwieldy. They require using references, and most people are confused by references.

Wednesday, February 09, 2005

Reusing Software

A big difference between software and physical engineering is that software changes greatly over the course of its life. The object oriented programming approach is supposed to make this change easier and more reliable. Instead of duplicating and modifying code, we'd like to extend existing code and make our changes in the extensions.

There are still a few impediments to reuse in popular platforms. The following is a small collection of research into these impediments.

There mustn't be conflicts between new symbols added to the base code and the extensions. It wouldn't be difficult for languages to support this better.

The non-private call graph of public and protected methods in the parent mustn't change.

There must be a way to better encapsulate objects.

There must be a way to encapsulate and compose concurrent access to state.

Thursday, January 27, 2005

Software Commodification

Though there's been a lot of talk about software commodification (and commoditization), I've never seen the issues formulated simply, so here goes...

A commodity is any product which meets some agreed upon criteria, such that any instance of that commodity may be substituted for any other. A quart of milk from any supplier will satisfy your cupcake recipe, no matter who produces it, so milk is a commodity. A necktie, however, is subject to all sorts of weird style criteria, so it is not a commodity. It might not even be your color.

All software programs are commodities in essence. Programs follow logical rules and perform to a specification, but we can't say a given program is a commodity before it has an actual substitute. So commodity nature is revealed when a given piece of software acquires competing alternatives.

Most software alternatives are not perfect substitutes in the sense that you can switch implementations and barely notice. Usually switching requires retraining the people that use the product and porting the software that uses the product. Vendors obviously want to keep switching as difficult as possible, and techniques for doing so are called "lock-in". By definition, strong lock-in hurts the consumer, but it also usually hurts the consumer by creating an inferior product. A product with strong lock-in will generally be monothilic, have complicated interfaces, and have hidden dependencies.

Given a choice, anyone (whether selling software or anything else) would prefer to have their product not to be a commodity. This is because markets are especially efficient at setting commodity prices, and this means suppliers will have stiff competition.

It's worse to have your software product become a commodity. This is because of the nature of software (and any non-physical product). The first copy of your software will cost you $2,000,000 but every subsequent copy will cost you only $2. This means that when someone enters your market, they've already done the hard part, and there's no incentive to leave. More importantly, the price of a software product is completely arbitrary. A price war could bring prices all the way down to zero.

Free competitors may be particularly formidable. Only with software is free competition realistic, because with physical products, revenue must reliably exceed costs. Free software makes it particularly easy to pool contributions from disparate sources, with very little administrative overhead, and very little reliance on a single controlling entity. (If a maintainer isn't doing a sufficiently good job, then someone may "fork" the development effort.) Free efforts also generally aren't motivated to lock-in.

All things being equal, it's desirable for software vendors to have the other software that they depend on to be commodified. This is for a few reasons: your dependencies have power over your product, and when your dependencies are cheaper, your overall product solution will be cheaper. There'll be a larger market for your product.

So if free software is so easy and desirable, how can a commercial software vendor stay in business? By presenting a moving target. Any given software release may be straightforwardly commodified, but the release process may not be. No entity or group of entities will have external incentive to track a series of upgrades. This is partially why "tying" is so important to a large software vendor, because tying is the only way for a mature product to escape commodification.

Ironically, the vendor is really in the business of selling timely upgrades, a service business which uses the base software product as merely a barrier to entry.

Wednesday, January 26, 2005

Intuitive Interfaces

An interface can be intuitive for a many reasons:

It works like other similar interfaces that the user has seen. (Green means "go".)
It's logical. (Cars' signalling interface correspond to the way that you turn the wheel.)
It's internally consistent. (The same abstraction works the same way in different contexts.)
It has a shallow learning curve.
- It has helpful pointers, ideally inobtrusive. (Examples are popup help bubbles and pie menus)
- It allows safe experimentation. (You can "undo" any operation.)
It's simple. :)

Python Evolution and AOP

An interesting trend in the evolution of the python language is that it has acquired ways of hooking into more and more code points that are not traditionally hooked into.

For a while, python has had metaclasses, which allow customizing the behavior of whole classes of classes :). These metaclasses allow running code on class creation, and routinely involve customizing object creation.

In order to obviate java's setter/getter pattern, python has the ability to run code on getting an attribute or setting an attribute or both.

Recently, python attained a decorator feature, which allows running code on function/method creation. This feature was originally motivated by wanting to specify things like a method being "static".

Now there's talk about implementing type-checking use adaption of function/method arguments and return values.

This trend could seem a lot like Aspect Oriented Programming, but it has the important difference that the path to the hook code is always evident at the hooked code. A main point of AOP is perfect "obliviousness", according to which you can alter the behavior of foreign code without making any modification to it. In python, the main code body can be oblivious of the hook, but it's always preceded by an explicit reference to the hook, almost like an annotation.

In python, each of these hooking mechanisms has a different syntax, presumably tailored to its primary use case. These features are mostly made possible by python's easy introspection support, and by its uniform "object" treatment of all of these abstractions.

Monday, January 24, 2005

Managing Change

It's time to release an upgrade to your whiz-bang widget library. If you want to avoid breaking all the code that people have written to use and even extend your library, here's what to do.

Obviously, don't remove public and protected methods, or change their semantics. For protected methods, this includes having them continue to call other nonprivate methods under the same conditions.

Less obviously, if you're creating new public or protected methods, you should move your code to an entirely new class! Someone may have extended your class with a method that has the same signature that you've just added, causing a needless conflict. Under the old class, create proxy classes that delegate every method to the new code. You don't have to worry about the maintenance overhead, because these proxies represent the legacy interface; you shouldn't have to change them later.

This hints at the fact that your library should be very careful about calling its own nonprivate methods. Calling internal nonprivate methods should be reserved for supported abstractions, because your library will down-call into extensions of these methods. If you don't want to support the abstraction, move the implementation into a private method. Have your internal code call the private method, and have your public method just be a wrapper for clients.

This approach is not so elegant because it requires code changes in order to support unchanging function. Maybe languages should support using interfaces for encapsulation, so that any symbols not present in the specified interface(s) would be completely hidden. However, all the recent python typechecking and adaption thinking seems to be focused on method calls, not base class specification.

Thursday, January 06, 2005

Minimum Curriculum

Everyone should learn the most important ideas of human civilization. There should be a year-long lecture course that introduces all the "must see" ideas, the best that the human race has found, the stuff that would be tragic for a person to have lived and never learn. Certainly many things require lengthy study to be useful, and many things are already as clear as day to many people, but there are many important ideas of which educated people are ignorant.

Even if this makes sense for ideas, it certainly makes no sense for art. The survey courses in dance, art history, music history, and world literature probably can't be meaningfully condensed any further. But no one should miss out on the major ideas of the introductory courses in math, astronomy, physics, chemistry, biology, psychology, economics, computer science, philosophy, political science, philosophy, and history. Some ideas fall through the cracks of all of these intro courses. (For example, what about the central points of The Death and Life of Great American Cities?) Some ideas are also too new to make it into conventional courses. For many ideas, only a few minutes explanation would be sufficient.

Many survey courses take the historical approach, but in this context it would waste too much time. To make it more interesting, the course (and more importantly the reading list) could touch on the limits of human knowledge and current work.

I originally had this idea in yeshiva, in a spirit of parochialism: "take this course, and then forget about the rest." Ironically, the course would probably not be popular with the insular (even if it were to omit evolution). This idea came back to me recently, thinking about John Rawls. I realized that neither my college survey in humanities nor intro to philosophy mentioned him.

The following are the ideas that I could come with quickly; please suggest more.

big bang, scale and elements of the universe, basic probability, limits, idea of calculus, newton's laws, four forces, electricity, nature of light, relativity, phases of matter, entropy, periodic table, cells, virii, bacteria, dna, systems of the body, evolution, milgram study, the unconscious, supply and demand, how a computer works and what a program looks like, urls, kant's categorical imperative, veil of ignorance, a myriad of -isms: utilitarianism, facism/socialism/capitalism(democracy), world religions, epistemology, problem of induction, logic and godel for dummies, mixed-use cities, human history in 30 minutes :), how toilets work

Wednesday, January 05, 2005

G-d Exists, so I could be Wrong

From yesterday's new york times article "God (or Not), Physics and, of Course, Love: Scientists Take a Leap":

David Myers
Psychologist, Hope College; author, "Intuition"

As a Christian monotheist, I start with two unproven axioms:

1. There is a God.

2. It's not me (and it's also not you).

Together, these axioms imply my surest conviction: that some of my beliefs (and yours) contain error. We are, from dust to dust, finite and fallible. We have dignity but not deity.

And that is why I further believe that we should

a) hold all our unproven beliefs with a certain tentativeness (except for this one!),

b) assess others' ideas with open-minded skepticism, and

c) freely pursue truth aided by observation and experiment.

This mix of faith-based humility and skepticism helped fuel the beginnings of modern science, and it has informed my own research and science writing. The whole truth cannot be found merely by searching our own minds, for there is not enough there. So we also put our ideas to the test. If they survive, so much the better for them; if not, so much the worse.

Tuesday, January 04, 2005

Core Assumptions

A college friend gave me a compliment; he said that when I argue with someone, I persist only until discovering his assumption that conflicts with my assumptions.

Wouldn't it be great if all disagreements could be reduced to conflicting core assumptions? If the parties could agree on this reduction, then maybe the assumptions themselves could be meaningfully discussed and compared.

Of course, I'm thinking of the electoral split of recent times. Could it be that conservatives simply don't accept the idea of the Veil of Ignorance?

Beating Java on Its Own Turf

The author of python, Guido van Rossum is writing some great stuff about type-checking in python, which will enable even more specific contracts.

This is exciting because it promises to achieve the safety of java with the simplicity of python. Safety is one of the three technical reasons that people still use java instead of python. The other two are performance and enterprise software. Performance is being addressed by the newly funded PyPy. Python already has much of the java enterprise technology, just not in the form of ballyhooed specifications. (For the remaining technologies, it's not clear that the "enterprise solution" achieves better reliability or scalibility than folks' nonstandard solutions.)

That the business community seems to care more about these specifications than about the actual technology may be for the same reason that it doesn't greatly prefer Sun's specifying Java over Microsoft's owning .NET. (Without a patent grant, the incomplete .NET standards work is almost worthless.) One would think that having one company write specifications for which many companies can build implementations would be much safer, and would create much more open software. That the wholely proprietary .NET is even considered a contender implies that people only care about the company that writes the interfaces.