Farfetched Blog: 2004

Monday, December 27, 2004

Special Characters and Gmail

Not only does gmail filter out ".", it also chops off everything from the first "+".

This enables using tmda or simply filtering on what address you choose to give. (Similar to how some people give vendors their names with different middle initials, in order to track how their name got onto different lists.)

Monday, December 20, 2004

JOtherComboBox

Java is a little wack. Its selection box gui widgets have items which are either editable or not. The following is the necessarily verbose way to create an uneditable selection box that allows adding a new item with an explicit action. (Contrast with autocompletion in an editable field, which provides a very low barrier to adding new items.)

import java.awt.Component;

import java.awt.event.ActionEvent;

import java.awt.event.ActionListener;

import java.awt.event.KeyEvent;

import java.awt.event.KeyListener;

import javax.swing.JComboBox;

import javax.swing.JFrame;

import javax.swing.ComboBoxModel;

import java.util.Vector;



/**

*

* JComboBox for which selecting a particular item allows editing a new item

*/

public class JOtherComboBox extends JComboBox {

	int otherIndex;

	ActionListener otherItemActionListener = new ActionListener() {

		public void actionPerformed(ActionEvent e) {

			JOtherComboBox widget = JOtherComboBox.this;

			if (widget.getSelectedIndex() == otherIndex) {

				widget.setSelectedItem(null);

				widget.setEditable(true);



				widget.requestFocus();

				widget.getEditor().getEditorComponent().addKeyListener(new KeyListener() {

					public void keyTyped(KeyEvent e) { work(e); }

					public void keyPressed(KeyEvent e) { work(e); }	

					public void keyReleased(KeyEvent e) { work(e); }

					public void work(KeyEvent e) {

						((Component)e.getSource()).removeKeyListener(this);

						e.consume();

					}

				});						

			} else if (widget.getSelectedIndex() >= 0) {

				widget.setEditable(false);

			}

		}

	};

	

	public JOtherComboBox(Object[] arg0) { super(arg0); init(); }

	public JOtherComboBox(Vector arg0) { super(arg0); init(); }

	public JOtherComboBox(ComboBoxModel arg0) { super(arg0); init(); }

	public JOtherComboBox() { super(); init(); }



	public void init() {

		this.addActionListener(otherItemActionListener);

	}



	public int getOtherIndex() {

		return otherIndex;

	}

	

	public void setOtherIndex(int arg0) {

		otherIndex = arg0;

	}



	public static void main(String[] args) {

		JFrame frame = new JFrame();

		JOtherComboBox comboBox = new JOtherComboBox(new String[] {"one","two","three","four"});

		comboBox.setOtherIndex(2);

		frame.getContentPane().add(comboBox);

		frame.pack();

		frame.setVisible(true);

	}



}

Friday, December 17, 2004

Databases Suck

That is, what we refer to as relational database management systems suck. They're too monothilic. An RDBMS typically has

a persistent store
a type system
indexing features
a transaction manager
a query language
a network protocol
usually a full embedded programming language

All this stuff makes them inflexible, and painful to integrate. People like them because they're completely independent of the rest of your code, even to the extent that a human being can often poke at them without application-specific software. They're also standardized-ish.

One Device

Even if single-purpose devices often work better, when it comes to mobile devices, people want one device that does everything. Otherwise, it can get to be a lot to carry around.

So here are the functions that are candidates for integration:

telephone
camera
image browser
audio player
storage media
broadcast radio receiver
audio recorder
web browser
email client
organizer
video player
video camera

Most of these have already been integrated into at least one cell phone product. When will we see it all come together? What a device to have hackable!

These super-devices will change the world. In order to do it, they'll...

use good voice recognition as the primary user interface
embrace a more end-to-end approach

Cell phone technology is still very far from end-to-end. Carriers will try to prevent the change, because it will completely overturn their businesses.

Thursday, December 16, 2004

Note about Gnome Storage

Not only will Gnome Storage make finding documents easy, remove the need for "Save", and therefore simplify task management, it will also replace a lot of existing interfaces.

Bookmarks, browser history, email indices, song playlists, photo galleries, "personal information management" views: working with these interfaces composes a large part of the time we spend with computers. Once all their info goes in a common storage layer, there'll be no reason to have many different metadata viewer implementations. (Of course, there'll be a mechanism to customize views.)

But that's not all. The desktop'll get many of the advantages of the web. We'll be able to create links to any object, whether it's an email message, a song, or a todo item. Any object may be bookmarked, and we'll get a history of all the objects that we've viewed.

This means big things for the semantic web movement, because it'll create huge amounts of metadata. Not only will we suddenly have easy, standard access to how often we listen to our favorite songs, we'll also know that they're predominantly in the "mp3s from joe" query folder, for example.

Once most applications can store their data in common, we can give them a common implementation of other fun features, versioning and notification, collaboration and localization.

Monday, December 13, 2004

No Password for Local Users

Do you want to relieve local linux users of having to enter passwords? Follow these steps to disable default remote access, which will make it safe for default accounts to have no password.

create a group "remote", and add users to it that may login remotely
add the following lines to your /etc/security/access.conf:
```
+:remote:ALL
-:ALL:localhost
+:ALL:LOCAL
-:ALL:ALL
```

add the following line to your /etc/pam.d/system-auth:

account     required      /lib/security/$ISA/pam_access.so

add the following line in your /etc/pam.d/su:

auth       required       /lib/security/$ISA/pam_wheel.so use_uid deny group=remote

If you have a local domain, you might have to tinker more with access.conf, because the "LOCAL" source specification is funny; it matches any host or tty name without a ".".

This was tested with Fedora Core 2.

Of course, you could take the converse approach: give everyone a password, and just don't require passwords for local logins to accounts in the "local" group. This approach is probably more robust against configuration changes (e.g. if a service stops including "system-auth" it will allow logging in without a password). The first approach however is nice for granting minimum necessary privilege. Maybe both is best. Presumably the second approach would be done with a line in local service pam.d config:

auth       sufficient   pam_listfile.so item=user sense=allow file=/etc/localusers onerr=fail

and this works for gdm at least. "mingetty" (for text logins) is a problem because it shares the "login" pam config with "in.telnetd" for plain-text remote logins. Of course, you could hack it with pam_access again (and /etc/security/group.conf) , but the extra complexity probably isn't worth it.

My Daughter

My daughter is the coolest. She just turned three. A couple of days ago she was watching the Elmo episode about music. ("Elmo" is a friendly, red furry creature, on a segment of Sesame Street for toddlers. Each Elmo episode has a theme.) Elmo's pet goldfish, Dorothy, had a toy drum in her jar. My daughter observed, "Dorothy can't play the drum because she don't got hands. She also don't got a stick."

Wednesday, December 08, 2004

Gcc on Solaris

If you're getting

/usr/include/sys/termios.h:376: parse error before `uint32_t'

then you should be invoking gcc as:

gcc -I/usr/include

Tuesday, December 07, 2004

Better Spam Protection

In short, it's a combination "disposable addresses" and "challenge/response whitelists"...

Everyone in your addressbook can mail you without qualification
Everyone can respond to one of your private messages
Everyone can respond to one of your public messages within a fixed interval of time
Everyone else can mail you by confirming their first message by email

The "whitelist" refers to #1, and it may also incorporate the address of anyone new that you email. It's insufficient because many people send mail from a different address than they receive mail. #2 solves this problem by embedding a cookie in your return address (e.g. JoeSmith+keyword+8769d5.4f88c3@someISP.com). (Many ISPs allow you to receive messages to your account name appended with a plus or minus followed by any string.)

You can embed a hash of the address that you're sending to, plus a hash of that hash. The second hash allows you to check that the embedded cookie is valid. The first hash allows you to create a single return address for each address that you send to. If someone who knows that address ever sends you spam, you can blacklist that generated address only.

When you want to receive responses to a publically posted address, it isn't a question of whether you'll want to eventually blacklist the address, but when. In that case #3, you can embed a cookie which instead contains the date until which you'll accept unconfirmed messages.

Even people who can't use the above methods should be able to reach you #4. All of the those people must undergo the slight annoyance of confirming their first message. Their first message will trigger an automatic response from you explaining the situation. They must respond once to that message, and they'll then be added to your whitelist.

The TMDA program does almost all of the work. #2 requires a two line patch. TMDA out-of-the-box is completely unconfigured, but my configuration looks pretty portable. I'll post it after I've been using it for a bit.

Tuesday, November 30, 2004

Future Interfaces

Computer interfaces should be dramatically improved, in the spirit of mpt's crufty rant. He sounds a refrain "We have the technology, so why....?" I don't think we have the technology yet, but we should.

No one should ever have to "save" a file. This requires a few things:

Unlimited undo and redo
Files should only be named with UUIDs
Files should be easily searchable using metadata

Once we've eliminated the save requirement, we can eliminate the unnatural concept of "open" files and windows. The only natural concept is whether a file is visible or not. Everything else should be a transparent optimization.

We can't just use unix inodes instead of UUIDs for two reasons: they don't work across multiple filesystems and they can't handle being removed and restored. I don't think there is a stable implementation of the metadata interface yet, though Gnome Storage looks promising.

Finally, applications will need to support transparent persistence. We can't just use generic checkpointing because it would be so wasteful. (For example, a web browser keeps a lot of stuff in memory, but it really only needs to persist a scrollbar location, a url (and maybe a cached copy of the url, which it'll need apart from the checkpointed image anyway).) Window managers would also have to be better at managing and persisting window arrangements.

Besides window sizes and locations, we'd like to manage tilesets; we'd like them to remember the fact that we're working with two documents side-by-side. Tzorech iyun, sometimes tilesets just indicate the desire to work on two different things at the same time.

Monday, November 29, 2004

Traits for Python

Since python already has multiple inheritance, implementing traits isn't a matter of adding new functionality, but of restricting the functionality that's already there.

The metaclasses below enable preventing classes from having member data and preventing conflicting methods. (Just have your classes inherit from Trait or TraitUser.)

The next step is solving the Artistic Cowboy Problem in a pythonic fashion.

If you missed my earlier traits entry, here's the traits homepage link again.

#!/usr/bin/python

import operator, types, sets



def get_classdict(cls):

        "get name:attr dict from a class"

        names = dir(cls)

        classdict = {}

        for name in names:

                classdict[name]=getattr(cls, name)

        return classdict



def get_methods(classdict):

        "get a set of nonsystem methods from a classdict"

        methods = sets.Set()

        for (k,v) in classdict.items():

                if k.startswith('__'): continue

                if type(v) is types.MethodType or type(v) is types.FunctionType:

                        methods.add(k)

        return methods



class TraitClass(type):

        """Metaclass for classes that may not have state,

        whose conflicting trait methods must be resolved in extenders,

        and who may not extend classes which are not traits."""



        def __init__(cls, clsname, bases, classdict):

                type.__init__(cls, clsname, bases, classdict)

                cls.conflict_methods = sets.Set()

                all_methods = sets.Set()

                for c in cls.get_method_lists(bases, classdict):

                        methods = get_methods(c)

                        cls.conflict_methods |= all_methods & methods

                        all_methods |= methods

                #propagate conflicting methods from base classes

                for c in bases:

                        if 'conflict_methods' in dir(c):

                                cls.conflict_methods |= c.conflict_methods

                for c in bases:

                        #if c == object: continue

                        if type(c) != TraitClass:

                                cls.handle_base_class(c)



        def get_method_lists(cls, bases, classdict):

                "traits check for conflicts from parents and from self"

                return map(get_classdict, bases)+[classdict,]



        def handle_base_class(cls, c):

                "enforce only inheritting traits"

                raise "Trait %s cannot inherit from a class that is not a Trait"%cls.__name__,c.__name__



        def __new__(cls, clsname, bases, classdict):

                #if a nonexplicit class extends nontraits, make it a trait user

                if classdict.get('__metaclass__') != TraitClass and 0 < len([c for c in bases if type(c) != TraitClass]):

                        classdict['__metaclass__']=TraitUserClass

                #if class is actually a trait, make readonly

                else:  

                        classdict['__slots__']=[]

                return type.__new__(cls, clsname, bases, classdict)



class TraitUserClass(TraitClass):

        """Metaclass for mostly normal classes,

        that must use single inheritance of nontraits,

        and that implement trait conflict handling"""



        def __init__(cls, clsname, bases, classdict):

                TraitClass.__init__(cls, clsname, bases, classdict)



                unresolved = getattr(cls, 'conflict_methods',sets.Set()) - sets.Set(classdict) - sets.Set(dir(getattr(cls,'nontrait_base',None)))

                if len(unresolved) > 0:

                        raise "conflicting methods",unresolved



        def get_method_lists(cls, bases, classdict):

                "check for conflicts in parent traits"

                nontrait_base = getattr(cls, 'nontrait_base', None)

                return [ get_classdict(c) for c in bases if c != nontrait_base]



        def handle_base_class(cls, c):

                "enforce single inheritance of nontraits"

                nontrait_base=getattr(cls,'nontrait_base',None)

                if nontrait_base:

                        raise "Trait-using class %s can extend no more than one class that is not a Trait"%cls.__name__, c.__name__

                cls.nontrait_base = c



        def __new__(cls, clsname, bases, classdict):

                #python weirdness; __metaclass__ doesn't override parent's here

                if classdict['__metaclass__'] == TraitClass:

                        raise "Trait %s cannot inherit from a class that is a TraitUser"%clsname 

                return type.__new__(cls, clsname, bases, classdict)



class Trait: __metaclass__=TraitClass #convenience object

class TraitUser: __metaclass__=TraitUserClass #convenience object

Python Metaclasses are Weird

$ python <<''

> class m1(type): pass

> class m2(m1): pass

> class c2: __metaclass__=m2

> class c1(c2): __metaclass__=m1

> print 'the type of c1 is', type(c1)

>

the type of c1 is <class '__main__.m2'>

Monday, November 22, 2004

Why XML is Case-Sensitive

# re: Tyranny of the geeks 11/20/2004 1:27 AM Tim Bray

XML markup is case-sensitive because the cost of monocasing in Unicode is horrible, horrible, horrible. Go look at the source code in your local java or .Net library.

Also, not only is it expensive, it's just weird. The upper-case of é is different in France and Quebec, and the lower-case of 'I' is different here and in Turkey.

XML was monocase until quite late in its design, when we ran across this ugliness. I had a Java-language processor called Lark - the world's first - and when XML went case-sensitive, I got a factor of three performance improvement, it was all being spent in toLowerCase(). -Tim

Friday, November 19, 2004

The Way to Manage Windows

Writing a window manager is hard (just like making war). I've no time to do it, so I'm going to describe my vision here.

Windows should never overlap. Windows overlap when one partially obscures another.
Windows should only ever be grouped by tabs or tiles.

Tiled windows are completely visible, and occupy all the available space because their edges line up. You can see an example in WindowsXP. If you have a bunch of windows open for an application, right clicking on that application's taskbar icon will give you "Tile Horizontally" and "Tile Vertically" commands.

Tabbed windows only have a single visible window, which also occupies the whole available space. Optionally, the system can display little "tabs" for switching between windows, or it can provide different mechanisms.

By nesting tiles and tabs, any reasonable window behavior can be represented. "Workspaces" and "virtual desktops" are essentially tabs. "Sticky windows" can be implemented with top-level tiles. The beautiful thing about having so few concepts is that features are available in many contexts.

I'm still not sure what window operations should be available. Dragging a window to a different tabset or tileset should move it, but should we have a mechanism for swapping tiles? It should be possible to change a window from tile to tab and back, but what mechanism should there be for zooming a tileset?

Here's a tentative set of operations: each window gets "pull up" and "push down". For tabs, this is pretty self-evident. For tiles, it involves resizing the partner tiles, so that this tile is in the same place despite the push/pull. In addition to push/pull, tabs get an operation "change to tile", and tiles get two operations : "change to selected tab" and "change to unselected tab". These operations are reversible, so tile positions are recorded (and resettable). The dimensions of the window otherwise determine the default horizontal or vertical tile placement. Drag and drop "moves" both tabs and tiles, and they will become tabs and tiles of the drag target, respectively.

There are window managers that implement tiles, but they all have two problems: they don't use this beautiful dual idea, and they are almost completely unusable.

Unfortunately, since many applications make unwarranted assumptions, the window manager would have to support the legacy behavior also.

Editor Feature

When I cut Levi from "Reuven, Shimon, and Levi", and I paste it to before Reuven, I want my editor program to do the right thing. That is, I want my editor to display "Levi, Reuven, and Shimon" instead of "LeviReuven, Shimon, and ".

My java editor should also do the same thing for commas, but it's really inexcusable that java doesn't allow lists to have trailing commas (1,2,3,).

Thursday, November 18, 2004

Traits for Reuse

Traits are an exciting development in object oriented programming.

Traits are collections of methods, like objects without state. These methods are parameterized over other methods, which may be traditional object methods or other traits methods.

A class may inherit multiple traits, but without the problems of multiple inheritance and mixins . This is because 1) their composition isn't ordered and 2) they support composition operators for resolving conflicts.

Traditional classes still have single inheritence, and they contain all the state and glue code. Traditional inheritance is for subtyping, and traits inheritance is for reuse. The developers are quite conservative; they propose traits as an incremental addition to traditional languages. The traits methods may be visualized as code which is simply copied into their composing classes (almost reminiscent of linked editing).

The End of Even Numbered Linux Releases

Until recently, the Linux kernel development followed alternating cycles of adding new features and then stabilizing. The stabilization period was relatively short, and then development would branch off again. People would then be free to make low impact patches to the stable fork, but most developer attention would be focused on the new development fork.

With the most recent release, Linus has abandoned these alternating cycles. This is partly because new revision control tools allow him to better "cherry-pick", so that he can pull from other developers' own development forks. More importantly, though, the Linux distributions really want to make their own stable releases. They have their own release schedules, which they prefer not to tie to the kernel's schedule. The distribution kernels also get much greater user exposure (and testing) than Linus's own kernel, and so are better positioned to address stability issues.

The mainline kernel has become the permanent "unstable" kernel, albeit a little less unstable. Now the distributions contribute their patches back to the mainline and allow them to be beta tested there. If you want a rock stable kernel, you should get a distribution kernel, or even an enterprise distribution kernel.

Years ago, hackers debated whether Linux should remain a hacker system or if it should clean up its act and "go mainstream". Linux is remaining a hacker system, but companies have sprung up to build their own mainstream versions. It's nice that there's a business model in that.

Lee Revell describes the current situation nicely.

It remains to be seen whether there's a future for multiple distributions. It could be that, just like in the commercial software world, there can be only one. That is, that network effects may favor a single distirbution gaining the overwhelmingly majority of users.

Tuesday, November 16, 2004

My Bookmarklets

Boorkmarklets are bookmarks that consist of javascript code. Here are two that I rewrote:

Right-click on one and select "Bookmark This Link" or "Add to Favorites", and then you'll always be able to use the feature by selecting the bookmark that you created.

Friday, November 12, 2004

Making Things Easy to Maintain

A lot of the cost of software is maintenance. Currently, There Is No Magic Bullet for making maintenance cheap. But there are a lot of best practices which I'm going to try to collect.

Make it as simple as possible.
Use test driven development.
Keep as little state as possible.

And some particular points.

Centralize environment configuration.
Make jobs rerunnable/idempotent whenever possible.
Give temporary resources short-lived unique names.

Tuesday, November 09, 2004

Note About Dune

The two most powerful institutions in the Dune series are the Bene Gesserit and the Bene Tleilax.

The BG is entirely composed of women, and the BT of men. Both houses prefer to exercise their influence quietly, the BG by only manipulating and never controlling directly; BT by assuming a guise of inferiority. In the Dune world, artificial intelligences are absent, so improvement of human beings is the main historic goal. The BG pursue this goal through training the mind and body, and by a "human breeding program", while the BT pursue it through direct genetic engineering. Each institution has its own approach to effective immortality; the BG share the memories of all of their female ancestors, and the BT clone themselves and infuse the clones with their original consciousness.

The BG incorporates motifs of the Christian Church (and possibly the Jesuits), and the BT of Islam. Both houses pursue a Messianic dream. The dream of the BG is to create the "Kwisatz Haderach", but the dream of BT has not been revealed. The Kwisatz Haderach, among other abilities, will have access to the memories of both his male and female ancestors. It's clear that these schools must somehow synthesize, and in fact this has been alluded to in the last books of the series.

Though much of the Dune lingo is taken from greek and arabic, "Kwisatz Haderach" almost certainly refers to the hebrew קפיצת הדרך ("skipping of the way") that figures in Chasidic literature. It's a miracle of travelling great distances in a short amount of time. Accordingly, it was the Miles Teg character, and not "Paul Muad'dib" who became the first Kwisatz Haderach.

Incidentally, Leto, Paul, and Leto II evoke Avraham, Yitzchak, and Yaakov.

Thursday, November 04, 2004

On G-d's Side

There are two problems with fundamentalism: arrogance and self-service.

The root cause is that when you decide (on some level) to do the right thing, you get comfortable with the idea that you are doing the right thing.

Believing that you are doing the right thing can make you arrogant, and it can make you intolerant of people who are doing different things. Believing that you are doing the right thing can cause you to compromise for your own benefit. It's a case of the ends justifying the means, but it's more seductive. You feel that those who are doing good should be enabled to continue doing good. But if you ever flag in your devotion, you're also guilty of hypocrisy.

The converse danger is paralysis. If you refrain from judging, there's no basis for choice. All the choices that you do make will be weakened.

Maybe we can just continue to act on our consciences, so long as we realize how little we understand the connection between our efforts and their actual effects.

Wednesday, November 03, 2004

2004 Election Issues

from cnn's exit poll:

of the people who considered each issue most important, how many of them voted...

	overall	for bush	for kerry
taxes	5%	57%	43%
education	4%	26%	73%
iraq	15%	26%	73%
terrorism	19%	86%	14%
economy/jobs	20%	18%	80%
moral values	22%	80%	18%
health care	8%	23%	77%

also, people who voted for bush overwhelmingly thought that the economy is doing well, and that we should have invaded iraq. people who voted for kerry overwhelmingly thought the economy is doing poorly, and that we shouldn't have invaded iraq.

Categories of Chumra

The idea of the chuma, or stringency, in Orthodox Judaism looms large. G-d appointed the Jewish people His holy nation and gave us His Law, but individuals have always strived to do more. There are actually a few different categories of chumra. Scholars have written about the subject, but in the form of manuals for getting closer to G-d. I'm just describing people's actions as they appear.

Ever since the disbanding of the Sanhedrin (central government of rabbinic judaism), religious legislation has been somewhat up in the air. This situation leads to the most fashionable type of chumra, the attempt to satisfy all the conflicting decisions made by a group of different decisors.

The group of decisors is often the rishonim (medieval scholars), and satisfying their decisions inspires many of the famous "Brisker Chumros". These chumros are more compelling when the issue is a matter of Biblical law rather than Rabbinic, but often the provenance itself is also in question. It is literally impossible to always satisfy all of the decisions of all the rishonim. [To be written: rashi and rabbenu tam tefillin example, rov example]

As we consider later decisions, it becomes increasingly less clear whether a given position is a chumra or actual halacha. At least for opinions of the rishonim, we usually have a idea of the halacha from the Shulchan Aruch. After that, we often have many different strong positions and no agreed upon principle for deciding amongst them. These questions are usually either less fundamental, or else they deal with technology and new situations.

A subcategory of satisfying all decisions is to observe authoritative decisions that are mostly disregarded. This disregard may be explained away by later authorities, or it may be mysterious. Either way, we are reluctant to suppose that the majority of observant jews are violating halacha, so these decisions take on the status of chumra. An example is Ezra's decree that after seminal emission, men should immerse in the Mikveh before learning or praying.

Another major category of chumra is that of ascetic practices. In general, the torah encourages--and perhaps requires--embracing life, but historically scholars have valued greater restraint. Maimonides famously praises the ideal of moderation, but his view of moderation looks harsh by today's standards. An example is fasting most Mondays and Thursdays.

Excruciating attention to detail is another major form of chumra, which some consider to be part of normative halacha. The classic example is that halacha specifies the proper order for tying your shoes. This trend goes back to the Talmud, which wrangles over the exact right place to first break open a loaf of bread. People may try to be meticulous in interpersonal relations as well (for example, not telling a companion at a restaurant that you dislike your meal, lest the waiter overhear and get hurt feelings).

Many of these chumras evoke awareness of G-d and godliness in every aspect of one's life, even the most picayune. It could be that accomplishing this awareness is more important than the actual details themselves.

Mysticism informs mainstream halacha, but it also forms a rich body of chumra. Presumably this category need not be limited to the traditional literature, but may include fringe texts, and even personal research into hidden worlds.

Much of civil law is about compromise, where neither party to a dispute is actually at fault. The Talmud describes "chasidus" as the habit of forgoing legal advantage in many such situations, and exceeding one's pecuniary obligation. [Source in bava kamma]

The last category of chumra is a biggy: do more good. This means strengthening the pillars of the world: learning more torah, giving effort and money to those in need, and praying more meaningfully.

Friday, October 29, 2004

Tradition

Anybody who's seen Fiddler On The Roof knows that Judaism is all about Tradition.

Why should that be?

Making tradition so central (and not just a particular tradition, but the institution of tradition itself), emphasizes the notion of the Jewish People as an entity, an organism.

Just as a person grows throughout his life, and interacts with his environment, so too, we are to imagine the Jewish People moving through history.

Wednesday, October 27, 2004

Overdue Library Book Notifier

UPDATED: allow excluding weekend days from the notification interval.

#!/usr/bin/python

#mail notification of due books on libraries using the Dynix Information Portal

#todo: renew each that is due, and mail only if fail



import re, time, smtplib

from urllib import urlopen, urlencode

from datetime import datetime, date



days_warning = 4

excluding_weekends = False

accounts = (

       ("1 2345 67890 1234","1234"),

       ("9 8765 43210 9876","9876"),

)

fromaddr = "somebody@somewhere.com"

toaddrs = ( "somebody@somewhere.com", "somebodyelse@somewhere.com", )

headers = [ "Subject: LIBRARY BOOKS DUE", ]



url = "http://leopac4.nypl.org/ipac20/ipac.jsp"

params = {

       "menu":"account",

       "aspect":"overview",

       "profile":"dial--3",

       "button":"Login",

}

itemsout_re = re.compile("href=.*?uri=.*?>(.*?)</a>.*?(\d{2}/\d{2}/\d{2}).*?(\d{2}/\d{2}/\d{4})")

trailing_nonword_re = re.compile("\W*$")



itemsdue = []

for params["sec1"], params["sec2"] in accounts:

       a = urlopen(url+"?menu=account") #get session ticket

       params["session"] = session = re.search(r"session=([\w\.]+)", a.read()).group(1)

       b = urlopen(url, urlencode(params)) #do login

       c = urlopen(url+"?session="+session+"&menu=account&submenu=itemsout") #get book list

       html_itemsout = c.read()



       today = date.today()

       if excluding_weekends and (today.weekday() + days_warning) >= 5: #saturday

               days_warning += 2

       for i in itemsout_re.finditer(html_itemsout):

               due = datetime(*time.strptime(i.group(3),"%m/%d/%Y")[:6]).date() - today

               if due.days < days_warning:

                       itemsdue.append((trailing_nonword_re.sub("",i.group(1)),due.days))



if itemsdue:

       msg = "\n".join(headers+[""])

       for i in itemsdue:

               msg += "\""+i[0]+"\" is due in "+str(i[1])+" days\n"

       smtplib.SMTP('localhost').sendmail(fromaddr, toaddrs, msg)

Wednesday, October 20, 2004

Exceptions

There are three broad categories of exceptions that should be handled:

Contract not met by client (either code or supporting configuration)
Resource not available
Programming error in provider code

Three are three broad ways to handle them:

Try to fix it
- Reset state
- Address the issue specified by the exception
Just do without the functionality
Present the error
- Log it
- Report to the user

General rules of exceptions:

Throw early
Catch late
Handle exactly once
New exceptions are for new ways of handling
Use "finally" to do cleanup
Don't throw in constructors

If an exception can't be handled differently based on its type, you shouldn't create a new exception class for it.

With java 1.4 exception wrapping, there's no need to have separate exception classes for separate components.

Inspired by O'Reilly's Best Practices for Exception Handling.

Monday, October 18, 2004

Better Programmers' Editor

Programmers' editors should format as you type. Modern editors do this for hiliting, but they should also do it for spacing. For example, if you type:

    while(true){i+=1;}

the editor should immediately display:

    while (true) {

        i += 1;

    }

Cursor movement should also ignore that whitespace. (If the cursor is before the "i" above, pressing the left arrow should move the cursor to before the "{".)

Of course, the editor should also wrap long lines as you type.

This behavior is described in Displaying and Editing Source Code in Software Engineering Environments, but I haven't seen an implementation.

Just found Harmonia, which looks like it does it. An eclipse plugin is in the works.

Thursday, October 14, 2004

What Politicians Don't Talk About

While watching the presidential debates, it occurred to me that there's a lot of relevant stuff that politicians all know, but don't talk about.

For example, I don't think anyone disputes that a strong motivation for invading Iraq was fear of unfriendly power over America's interests, namely its interest in cheap oil. (Why not Libya?) Presumably there's a good political reason why Kerry also doesn't mention this.

It goes the other way too; much of what politicians talk about is completely irrelevant.

Besides being highly cynical, what's wrong with this state of affairs? It insulates much of the electorate from political reality, but you could argue that everyone would be almost as insulated anyway.

Are the members of one political party disproportionately guilty of this disingenuousness?

Monday, October 11, 2004

Spaces vs. Tabs

How should programs be indented, with tabs or with spaces?

This is not a question of whether the programmer hits the Tab key or the Space bar on the keyboard. It's a question of what actually goes in the file, and what happens when people want to use different numbers of columns to display indentation levels. Literal Tab characters are referred to as "hard tabs", and sequences of Space characters are referred to as "soft tabs".

(Almost) everyone agrees that a single file should use only one or the other for indentation. Mixing hard and soft tabs has more problems than either of the other two pure approaches, and causes confusion. This is a religious war, but both approaches have their advantages:

Advantages of Soft Tabs

simplicity
columns may be arbitrarily aligned
xterm copying doesn't need special handling
yaml doesn't need special handling

Advantages of Hard Tabs

backspace and navigation work easily
number of displayed spaces is easily customizable
indentation semantic is represented directly, and already standard
makefiles don't need special handling

Arbitrary Alignments using Hard Tabs

With hard tabs, you can't portably line up your comments or constants like this:

doSomething();                             # this is a silly block of code,
doSomethingElseWithAMuchLongerStatement(); # which has a multi-line comment,
                                           # which extends past the code itself

All but the third line could easily be solved by using spaces for tabs that occur after non-whitespace on each line. Even the third could be excepted because it doesn't involve control flow. This motivates some people to recommend a hybrid "use tabs for indentation and spaces for alignment", and others keep it simple by recommending "use tabs for indentation and eschew alignment". Eschewing alignment could involve using:

mylist = [
           1,
           2,
           3
]

instead of:

mylist = [ 1,
           2,
           3
]

All this can be tricky, and reveals the most important advantage of soft tabbing: it's What You See Is What You Get, easily confirmable by checking that files contain no hard tab characters. It's harder to confirm that alignment isn't messed up when using hard tabs.

Xterm Clipboard using Hard Tabs

Copying from an xterm converts hard tabs to soft tabs, so you have to configure your editor to convert them back.

Navigation using Soft Tabs

Just like a single keystroke creates an indentation level, a single keystroke should move the cursor over an indentation level. Modern editors allow you to back up an indentation level using the Backspace key, but don't yet allow you to navigate indentation levels using the arrow keys.

Customizing the Number of Displayed Spaces using Soft Tabs

Though it's certainly possible to munge source code in order to customize the tabstop when you're viewing code, it's a little risky. Tools for some languages (perl) don't make any guarantees that it can be done reliably. Even if you can do it reliably, you'll have to hack it, and if you also make modifications you'll probably cause bogus diff history.

Makefiles using Soft Tabs

The configuration files used by the make tool require hard tabs, so your editor has to handle this.

Semantics

There is some beauty in representing the semantic unit with a single character, and without extraneous formatting information (sort of like "\n" instead of "\r\n").

Editor Configuration

vim for soft tabbers:

set expandtab | set smarttab

and shiftwidth should be set to configure the number of spaces to use for indentation; tabstop will be only be consulted for tabs not occurring at the beginning of the line. In order to handle makefiles, use: au FileType make setlocal noexpandtab vim for hard tabbers: set noexpandtab or maybe use a script to expand tabs later in the line

Update: see Elastic Tabstops

Platform Proliferation

Along the lines of the previous post, I hate platform proliferation. Even using the same programming language, code must be rewritten for each platform.

Windows and Mac each have a couple of platforms, as do the free desktops projects (Gnome and KDE), the major cross platform applications (Mozilla and OpenOffice), and most modern "languages" (e.g. Java, Python, OCaml).

People love the UNIX interface, in which everything is a file, and a file is a stream of bytes which may have a hierarchical name. That interface now appears to be insufficiently rich, but it's powerful exactly because it's so simple.

Maybe there's a new interface which is comparably simple, but can do 80% of what we want for components? Some examples of difficult modules to support are: graphical user interface, spell-checking, auto-completion, clipboard and search and undo, linking and embedding documents.

At the Desktop Developer Conference, Havoc Pennington talked about fragmentation of platforms.

Wednesday, October 06, 2004

One True Language

Techies often insist that it's good that there're many different programming languages, because each language has its strengths. They say that language design is all about making compromises, and that the compromises chosen for each is best for a particular problem domain.

This is a big mistake. For most applications, the benefit of using a specific language is minimal. The cost, however, of having code written in many different languages is huge. Code written in different languages is difficult to re-use. It's difficult to integrate. It's difficult to learn and support. In short, the language that code is written in is its single biggest dependency, and we are increasing and fracturing dependencies for no good reason. For a recent project at work, I had to write in several different languages: Java, Perl, Csh, SQL / Sybase stored proc language, XML, and a proprietary "Job Information Language". A single language could satisfactorily perform the jobs of all, and it would only need a single toolset.

Techies like the idea of a large language ecosystem because they love this stuff. They're attached to it, and want to see more of it. They love languages like shoemakers love shoes.

It almost doesn't matter what the one true language is, but of course I'm going to share some ideas about what I think it should look like.

The C programming language was an abstraction layer for making code portable. The next abstraction layer should make it as easy as possible to write and maintain code. All the other requirements are relatively insignificant.

So the language should be mostly procedural and object oriented. It should have automatic garbage collection. It should have built-in collection classes. It should be dynamically typed, but strongly typed.

The Zen of Python (by Tim Peters)

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Tuesday, October 05, 2004

Property

When we talk about "intellectual property" (IP), we've already decided exactly how it should work. Calling it "property" makes us think we should treat it like a sandwich, though it's really more like the proverbial cake that you can eat, and have it too.

100 years ago, when you bought a book, you owned that book. You could do anything you wanted with your copy of that book except make a copy of it. The government gave publishers the right to prevent copying their books, in order to maximize the amount of publishing that would be done.

Now, when you buy a book, you haven't bought anything. You've licensed that book. The publisher retains all the rights except for limited reading privileges.

This isn't yet completely true, but it is the way that everyone already thinks of it. The law considers every digital access to be a "copy", so every use is governed by copyright. "Click-wrap" and "shrink-wrap" allow publishers to easily dispense with traditional copyright, and invent any bizarre license conditions they please. This is only logical, since the publisher owns the work.

What's wrong with strict intellectual property? Shouldn't the market discourage unreasonable license terms, like restricting the right to read books aloud?

Well this market is inefficient. Publishers consolidate. Publishers are big, and consumers, in general, are small. This makes it difficult for consumers to negotiate. It's a shame that fair use will suffer, but there are two bigger problems with strong IP:

it's grossly inefficient
it could hamper the "prosumer revolution"

The internet makes it so easy to publish that potentially every consumer could become a producer of valuable creative work. Creative work is possible only by building on top of existing work. Clearing rights has become significantly more difficult than distribution. Most lawyers don't see anything wrong with increasing the need for lawyering in the world, but everyone else should recognize that the extra transaction costs are prohibitive.

It's hard to tell how inefficient our current system is, without real systems to compare it to. But it's equally hard to believe that authors are motivated by the potential revenues from their work 100 years from now. Less extreme cases are harder to judge.

With what should the intellectual property metaphor be replaced?

Many thinkers advocate "compulsory licensing", which requires everyone to pay a tax for funding new works. The proceeds from this tax would be divided up and awarded to authors based on the results of some sort of polling. It would be difficult to prevent gaming such a system. It would also be difficult to properly poll for derivative works. It would probably require administration to assign weightings to different types of work.

Alternatively, we could eliminate copyright law altogether, and have publishers just protect new content as best they can. This wouldn't be so different from today, except that content which is eventually freed would then be legal. Books would be the big loser in this scenario, because they are almost impossible to protect. Once comfortable reader devices are available, copying books will become rampant.

see Property, Intellectual Property, and Free Riding and The Darknet Paper

Monday, October 04, 2004

Hallel

ברוך אתה יהוה אלהינו מלך העולם אשר קדשנו במצותיו וצונו על נתילת לולב
ברוך אתה יהוה אלהינו מלך העולם אשר קדשנו במצותיו וצונו לקרוא את ההלל

הַלְלוּ-יָהּ:הַלְלוּ, עַבְדֵי יְהוָה; הַלְלוּ, אֶת-שֵׁם יְהוָה. יְהִי שֵׁם יְהוָה מְבֹרָךְ-- מֵעַתָּה, וְעַד-עוֹלָם. מִמִּזְרַח-שֶׁמֶשׁ עַד-מְבוֹאוֹ-- מְהֻלָּל, שֵׁם יְהוָה. רָם עַל-כָּל-גּוֹיִם יְהוָה; עַל הַשָּׁמַיִם כְּבוֹדוֹ. מִי, כַּיהוָה אֱלֹהֵינוּ-- הַמַּגְבִּיהִי לָשָׁבֶת. הַמַּשְׁפִּילִי לִרְאוֹת-- בַּשָּׁמַיִם וּבָאָרֶץ. מְקִימִי מֵעָפָר דָּל; מֵאַשְׁפֹּת, יָרִים אֶבְיוֹן. לְהוֹשִׁיבִי עִם-נְדִיבִים; עִם, נְדִיבֵי עַמּוֹ. מוֹשִׁיבִי, עֲקֶרֶת הַבַּיִת-- אֵם-הַבָּנִים שְׂמֵחָה: הַלְלוּ-יָהּ.

בְּצֵאת יִשְׂרָאֵל, מִמִּצְרָיִם; בֵּית יַעֲקֹב, מֵעַם לֹעֵז. הָיְתָה יְהוּדָה לְקָדְשׁוֹ; יִשְׂרָאֵל, מַמְשְׁלוֹתָיו. הַיָּם רָאָה, וַיָּנֹס; הַיַּרְדֵּן, יִסֹּב לְאָחוֹר. הֶהָרִים, רָקְדוּ כְאֵילִים; גְּבָעוֹת, כִּבְנֵי-צֹאן. מַה-לְּךָ הַיָּם, כִּי תָנוּס; הַיַּרְדֵּן, תִּסֹּב לְאָחוֹר. הֶהָרִים, תִּרְקְדוּ כְאֵילִים; גְּבָעוֹת, כִּבְנֵי-צֹאן. מִלִּפְנֵי אָדוֹן, חוּלִי אָרֶץ; מִלִּפְנֵי, אֱלוֹהַּ יַעֲקֹב. הַהֹפְכִי הַצּוּר אֲגַם-מָיִם; חַלָּמִישׁ, לְמַעְיְנוֹ-מָיִם.

לֹא לָנוּ יְהוָה, לֹא-לָנוּ: כִּי-לְשִׁמְךָ, תֵּן כָּבוֹד--עַל-חַסְדְּךָ, עַל-אֲמִתֶּךָ. לָמָּה, יֹאמְרוּ הַגּוֹיִם: אַיֵּה-נָא, אֱלֹהֵיהֶם. וֵאלֹהֵינוּ בַשָּׁמָיִם--כֹּל אֲשֶׁר-חָפֵץ עָשָׂה. עֲצַבֵּיהֶם, כֶּסֶף וְזָהָב; מַעֲשֵׂה, יְדֵי אָדָם. פֶּה-לָהֶם, וְלֹא יְדַבֵּרוּ; עֵינַיִם לָהֶם, וְלֹא יִרְאוּ. אָזְנַיִם לָהֶם, וְלֹא יִשְׁמָעוּ; אַף לָהֶם, וְלֹא יְרִיחוּן. יְדֵיהֶם, וְלֹא יְמִישׁוּן--רַגְלֵיהֶם, וְלֹא יְהַלֵּכוּ; לֹא-יֶהְגּוּ, בִּגְרוֹנָם. כְּמוֹהֶם, יִהְיוּ עֹשֵׂיהֶם-- כֹּל אֲשֶׁר-בֹּטֵחַ בָּהֶם. יִשְׂרָאֵל, בְּטַח בַּיהוָה; עֶזְרָם וּמָגִנָּם הוּא. בֵּית אַהֲרֹן, בִּטְחוּ בַיהוָה; עֶזְרָם וּמָגִנָּם הוּא. יִרְאֵי יְהוָה, בִּטְחוּ בַיהוָה; עֶזְרָם וּמָגִנָּם הוּא.

יְהוָה, זְכָרָנוּ יְבָרֵךְ: יְבָרֵךְ, אֶת-בֵּית יִשְׂרָאֵל; יְבָרֵךְ, אֶת-בֵּית אַהֲרֹן. יְבָרֵךְ, יִרְאֵי יְהוָה-- הַקְּטַנִּים, עִם-הַגְּדֹלִים. יֹסֵף יְהוָה עֲלֵיכֶם; עֲלֵיכֶם, וְעַל בְּנֵיכֶם. בְּרוּכִים אַתֶּם, לַיהוָה-- עֹשֵׂה, שָׁמַיִם וָאָרֶץ. הַשָּׁמַיִם שָׁמַיִם, לַיהוָה; וְהָאָרֶץ, נָתַן לִבְנֵי-אָדָם. לֹא הַמֵּתִים, יְהַלְלוּ-יָהּ; וְלֹא, כָּל-יֹרְדֵי דוּמָה. וַאֲנַחְנוּ, נְבָרֵךְ יָהּ-- מֵעַתָּה וְעַד-עוֹלָם: הַלְלוּ-יָהּ.

אָהַבְתִּי, כִּי-יִשְׁמַע יְהוָה-- אֶת-קוֹלִי, תַּחֲנוּנָי. כִּי-הִטָּה אָזְנוֹ לִי; וּבְיָמַי אֶקְרָא. אֲפָפוּנִי, חֶבְלֵי-מָוֶת--וּמְצָרֵי שְׁאוֹל מְצָאוּנִי; צָרָה וְיָגוֹן אֶמְצָא. וּבְשֵׁם-יְהוָה אֶקְרָא: אָנָּה יְהוָה, מַלְּטָה נַפְשִׁי. חַנּוּן יְהוָה וְצַדִּיק; וֵאלֹהֵינוּ מְרַחֵם. שֹׁמֵר פְּתָאיִם יְהוָה; דַּלֹּתִי, וְלִי יְהוֹשִׁיעַ. שׁוּבִי נַפְשִׁי, לִמְנוּחָיְכִי: כִּי-יְהוָה, גָּמַל עָלָיְכִי. כִּי חִלַּצְתָּ נַפְשִׁי, מִמָּוֶת: אֶת-עֵינִי מִן-דִּמְעָה; אֶת-רַגְלִי מִדֶּחִי. אֶתְהַלֵּךְ, לִפְנֵי יְהוָה-- בְּאַרְצוֹת, הַחַיִּים. הֶאֱמַנְתִּי, כִּי אֲדַבֵּר; אֲנִי, עָנִיתִי מְאֹד. אֲנִי, אָמַרְתִּי בְחָפְזִי: כָּל-הָאָדָם כֹּזֵב.

מָה-אָשִׁיב לַיהוָה-- כָּל-תַּגְמוּלוֹהִי עָלָי. כּוֹס-יְשׁוּעוֹת אֶשָּׂא; וּבְשֵׁם יְהוָה אֶקְרָא. נְדָרַי, לַיהוָה אֲשַׁלֵּם; נֶגְדָה-נָּא, לְכָל-עַמּוֹ. יָקָר, בְּעֵינֵי יְהוָה--הַמָּוְתָה, לַחֲסִידָיו. אָנָּה יְהוָה, כִּי-אֲנִי עַבְדֶּךָ: אֲנִי-עַבְדְּךָ, בֶּן-אֲמָתֶךָ; פִּתַּחְתָּ, לְמוֹסֵרָי. לְךָ-אֶזְבַּח, זֶבַח תּוֹדָה; וּבְשֵׁם יְהוָה אֶקְרָא. נְדָרַי, לַיהוָה אֲשַׁלֵּם; נֶגְדָה-נָּא, לְכָל-עַמּוֹ. בְּחַצְרוֹת, בֵּית יְהוָה-- בְּתוֹכֵכִי יְרוּשָׁלִָם: הַלְלוּ-יָהּ.

הַלְלוּ אֶת-יְהוָה, כָּל-גּוֹיִם; שַׁבְּחוּהוּ, כָּל-הָאֻמִּים. כִּי גָבַר עָלֵינוּ, חַסְדּוֹ--וֶאֱמֶת-יְהוָה לְעוֹלָם: הַלְלוּ-יָהּ.

הוֹדוּ לַיהוָה כִּי-טוֹב: כִּי לְעוֹלָם חַסְדּוֹ.
יֹאמַר-נָא יִשְׂרָאֵל: כִּי לְעוֹלָם חַסְדּוֹ.
יֹאמְרוּ-נָא בֵית-אַהֲרֹן: כִּי לְעוֹלָם חַסְדּוֹ.
יֹאמְרוּ-נָא יִרְאֵי יְהוָה: כִּי לְעוֹלָם חַסְדּוֹ.
מִן-הַמֵּצַר, קָרָאתִי יָּהּ; עָנָנִי בַמֶּרְחָב יָהּ. יְהוָה לִי, לֹא אִירָא; מַה-יַּעֲשֶׂה לִי אָדָם. יְהוָה לִי, בְּעֹזְרָי; וַאֲנִי, אֶרְאֶה בְשֹׂנְאָי. טוֹב, לַחֲסוֹת בַּיהוָה-- מִבְּטֹחַ, בָּאָדָם. טוֹב, לַחֲסוֹת בַּיהוָה-- מִבְּטֹחַ, בִּנְדִיבִים. כָּל-גּוֹיִם סְבָבוּנִי; בְּשֵׁם יְהוָה, כִּי אֲמִילַם. סַבּוּנִי גַם-סְבָבוּנִי; בְּשֵׁם יְהוָה, כִּי אֲמִילַם. סַבּוּנִי כִדְבוֹרִים-- דֹּעֲכוּ, כְּאֵשׁ קוֹצִים; בְּשֵׁם יְהוָה, כִּי אֲמִילַם. דַּחֹה דְחִיתַנִי לִנְפֹּל; וַיהוָה עֲזָרָנִי. עָזִּי וְזִמְרָת יָהּ; וַיְהִי-לִי, לִישׁוּעָה. קוֹל, רִנָּה וִישׁוּעָה--בְּאָהֳלֵי צַדִּיקִים; יְמִין יְהוָה, עֹשָׂה חָיִל. יְמִין יְהוָה, רוֹמֵמָה; יְמִין יְהוָה, עֹשָׂה חָיִל. לֹא-אָמוּת כִּי-אֶחְיֶה; וַאֲסַפֵּר, מַעֲשֵׂי יָהּ. יַסֹּר יִסְּרַנִּי יָּהּ; וְלַמָּוֶת, לֹא נְתָנָנִי. פִּתְחוּ-לִי שַׁעֲרֵי-צֶדֶק; אָבֹא-בָם, אוֹדֶה יָהּ. זֶה-הַשַּׁעַר לַיהוָה; צַדִּיקִים, יָבֹאוּ בוֹ. אוֹדְךָ, כִּי עֲנִיתָנִי; וַתְּהִי-לִי, לִישׁוּעָה. אֶבֶן, מָאֲסוּ הַבּוֹנִים-- הָיְתָה, לְרֹאשׁ פִּנָּה. מֵאֵת יְהוָה, הָיְתָה זֹּאת; הִיא נִפְלָאת בְּעֵינֵינוּ. זֶה-הַיּוֹם, עָשָׂה יְהוָה; נָגִילָה וְנִשְׂמְחָה בוֹ.

אָנָּא יְהוָה, הוֹשִׁיעָה נָּא;
אָנָּא יְהוָה, הַצְלִיחָה נָּא.

בָּרוּךְ הַבָּא, בְּשֵׁם יְהוָה; בֵּרַכְנוּכֶם, מִבֵּית יְהוָה. אֵל, יְהוָה--וַיָּאֶר-לָנוּ: אִסְרוּ-חַג בַּעֲבֹתִים--עַד קַרְנוֹת, הַמִּזְבֵּחַ. אֵלִי אַתָּה וְאוֹדֶךָּ; אֱלֹהַי, אֲרוֹמְמֶךָּ. הוֹדוּ לַיהוָה כִּי-טוֹב: כִּי לְעוֹלָם חַסְדּוֹ.

יהללוך יהוה אלהינו כל מעשיך וחסידיך צדיקים עושי רצונך וכל עמך בית ישראל ברנה יודו ויברכו וישבחו ויפארו וירוממו ויעריצו ויקדישו וימליכו את שמך מלכנו כי לך טוב להודות ולשמך נאה לזמר כי מעולם ועד עולם אתה אל ברוך אתה יהוה מלך מהלל בתשבחות

Monday, September 27, 2004

Open Source/Closed Standards

In the quest to open source java, it should be the trademark that depends on passing a test suite.

An example which hits the difference: a nonconfirming "John's Java" would be illegal unless I call it something different, but borrowing a linked list implementation for my embedded linux toaster project would be fine. I'd think that trademark affects the big players here more than copyright.

I'm responding to this O' Reilly blog entry.

Wednesday, September 22, 2004

Complex Systems

"A complex system that works is invariably found to have evolved from a simple system that worked."
--John Gall

Friday, September 03, 2004

GPL better and worse than you thought

Worse, for two reasons:

Anyone can modify GPL'ed software to give it an interface for interacting with proprietary software. Those modifications can even be GPL'ed :).
Anyone can distribute proprietary software which links to GPL'ed software. The proprietary author can claim that his software could instead link to a non-GPL'ed implementation of the interface. It's the user that is performing the linkage, and since the GPL only restricts distribution, he certainly isn't culpable.

Now these reasons only apply to re-use of GPL software with other software. Improvements to GPL'ed software should still be illegal to make proprietary.

If the GPL doesn't prevent proprietary re-use, maybe maintenance costs do. Proprietary projects will have to port their software to new releases of the free software without support, because the the free projects can't incorporate their changes.

Both gcc and the linux kernel appear to benefit from this effect. Linus Torvalds has refused to make technical compromises in order to ease the burden of proprietary module writers (even though their work is apparently legal). Richard Stallman has refused to make gcc more flexible lest he encourage proprietary extensions.

Better, because free software is actually less susceptible to patent litigation.

As a complete non sequitur, free software is an antidote to lock-in.

Thursday, September 02, 2004

HTML Screenscraping

Cool tool idea: an api that exposes the way a web page appears to a human viewer.

Such a tool would have ways to find page elements based on relative size and position with respect to other elements. It would allow grabbing "the same column in the next row of this table".

This would be complicated, of course, by weird formatting hacks whose elements aren't independently visible (like nested tables).

Even more ambitious would be to support touchy-feely things like finding a table by its "title", when that title is really a semantically unrelated, but visually tied element.

Using the tool would be a little frustrating, because it would never perfectly track all possible changes to the presentation of a site (e.g. some data may be moved between pages, or split up into multiple tables in the same page). Nevertheless, I think it would be a huge commercial success. It might also be a nice as component in a semantic web migration framework.

Thursday, August 26, 2004

Kernels

"the _one_ thing that kernel land does well: safely synchronize globally visible data structures"
--linus

Wednesday, August 25, 2004

Perl vs. Java

"Perl encourages unreadably concise code; Java encourages unreadably verbose code."

Wednesday, August 18, 2004

History

"[A]re there lessons in history, or just stories, mostly sad?"

Monday, August 16, 2004

"Heuristic Canonicalization Schemes"

The semweb integration vision doesn't address the need to munge the contents of nodes. You can't have rdf guess that "jingleheimer schmidt" is the same as "jingle-heimer-shmid".

It would be cute to have a standard for transcribing proper nouns from any language. It would use a single case, no punctuation or spacing, and a set of letters that correspond unambiguously to the common subset of pronounceable sounds of all human languages.

Does this exist already? Let's try LazyWeb!

Friday, August 13, 2004

Solving the Fragile Base Class Problem

...in three simple steps, from A Study of The Fragile Base Class Problem. The findings in that paper are important, and deserve popularization. Maybe I'll get a chance to write the article, but in the mean time, here are the three simple steps:

Make member data private (unless it's really part of the interface)
Never modify a base class method to start or stop calling an existing public/proctected method in that base class
Extenders should only expect that their overriding methods will get called by base class code when the overrided method is "protected"

UPDATE: Better, divide your classes into indivisible chunks of state.

Tuesday, August 10, 2004

Mazel Tov!

Three weeks ago, we had a baby boy. We named him Raphael. Mazel Tov!

The bris was beautiful, though we should've set up another table. I'm going to write here what I intended to say then, but didn't have time to.

That the torah portion for that week, Devarim, was particularly appropriate for us. In the portion, Moses addresses the jewish people before they enter the land of Israel, recounting in a roundabout way their journey through the desert.

This was an entirely new generation from the one that left Egypt, the one that had been condemned to die in the desert for the sin of the spies. Moses recalls the exhortation to the old generation, "אַל-תִּירָא, וְאַל-תֵּחָת", "do not be afraid, and do not be discouraged." As we saw, this was apt advice. The jews sent spies to scout out the land, became fearful when they heard the report of its giants, and desired to return to Egypt.

The traditional explanation for this sort of foot-dragging is that the jewish people knew that G-d was capable of giving them the land, but were worried that they weren't worthy to receive His help. Certainly on the simple level, they doubted both themselves and G-d, and this became a self-fulfilling doubt. They made themselves unfit to enter the land.

The new generation had never experienced slavery. They had relied on no power but G-d. G-d was with them, and they were confident. To this generation, Moses could only advise, "אַל-תִּתְגָּר בָּם", do not contend with (most of) the peoples that you encounter when you enter the land. Apparently the biggest danger to the jewish people was the danger that they might misuse their strength.

Now we in the tenor hopefully aren't dying in the desert, but the contrast between the two generations powerfully echoes every parent's natural wishes for his children. We hope that our children will be able to go everywhere that we could not. We hope that they will be able to learn from our mistakes, and surpass us in everything. We hope that they will help to bring redemption for our people and for the whole world.

Monday, August 09, 2004

Important Technological Developments

I think these will be important in the next twenty years.

Ad-hoc, scalable wireless networking, with nice links at Reed's Locus Open Spectrum Resource Page
Peer-to-peer networking
Semantic Computing: The Semantic Web and GNOME Storage
PyPy, making efficient programming easier
Arch and Conary, making distributed development easier
E-Ink, or similar technology, making digital displays as comfortable as the printed page
Better voice recognition, possibly even of a whisper (This together with the pen interface will replace mouse and keyboard.)

Tuesday, July 13, 2004

Mixins Paper

Modular Object-Oriented Programming with Units and Mixins explains why mixins are so important. Why aren't more people turned on to this stuff?

Friday, July 09, 2004

The Point of JDNC, XAML, and XUL

My colleague assumed that the point was to enable round-tripping gui builders. He didn't even think that web deployment was a significant goal.

Now that he mentions, I can't see that html/xml confers any technical benefit for running in the browser. What about round-trip editting? Is round tripping gui building really so much easier with an xml dialect than with code which uses simplified widget api?

Monday, July 05, 2004

Better AOP

Here's a better way to support Aspect Oriented Programming:

mixins!
wildcard methods
methods that override constructors (and superclasses) for a class

This is approach is nice because it doesn't introduce an entirely new abstraction for extending code. With mixins, the caller can extend a class that the mixin author didn't know about. With wildcard methods, the mixin can be more flexible.

Overriding constructors is probably the most controversial part, by which I mean lanugage support for the pattern of my July 1 entry. For example, many component libraries consist of several classes, but for ease-of-use, only make a few of those classes available. In order to extend the behavior of the implementation classes, it would be nice to be able to do something like:

public class A2 extends A {

    B.new() {

        return new B2();

    }

}

Friday, July 02, 2004

Nature of Aspect Oriented Programming

If you've been frustrated about exactly what AOP is, you'll enjoy Aspect-Oriented Programming is Quantification and Obliviousness. Also interesting is the realization that Aspect-Oriented Programming is just "intercessional reflection"

Thursday, July 01, 2004

More on Objected Oriented Design

In order for a class to be extensible, it must never obtain references to objects directly.

It must use separate factory methods whose sole responsibility is to obtain and return those references. That way, a subclass can easily override those methods to substitute different objects.

For example, instead of:

public class Fu {

    public void doSomethingWithBar() {

        new Bar().doSomething();

    }

}

use the following:

public class Fu {

    public void doSomethingWithABar() {

        newBar().doSomething();

    }



    protected Bar newBar() {

        return new Bar();

    }

}

which allows:

public class Fu2 extends Fu {

    protected Bar newBar() {

        return new Bar2();

    }

}

A Solution to the Fragile Base Class Problem explains (peripherally) why these methods should be protected.

This is the GOF factory pattern, but it should be applied more generally. Possibly people restrict its use because they're worried about components not being sufficiently well thought out. There may be some subtle dependency on the actual object implementations. To specify that there isn't any such dependency, it would be be better to use an interface for the return type of these factory methods:

protected BarInterface newBar() {

    return new BarImplementation();

}

Wednesday, June 30, 2004

More on Minyan Sign-in

Here's a python implementation of the email part of my minyan counter design. Beware, it hasn't been tested.


#!/usr/bin/python



PREVIOUS_COUNT_FILE = '/tmp/m-last'

YIDDEN_FILE = '/home/josh/minyan_goers'

INTERVAL = 60

LATE_INTERVAL = 5

LATE_TIME = '14:00'

CUTOFF_TIME = '15:10'

PROXY = {'http':'http://aproxyserver:82'}

TALLY_URL = 'http://aminyancounter.cgi'

FORM = '<form action="'+TALLY_URL+'" method="post"><input type="hidden" name=email value="$EMAIL" /><input type="submit" value="yes" /><input type="submit" value="no" /></form>'



from os.path import getmtime

from datetime import date, datetime, timedelta

from time import time, mktime, strptime, sleep

from sets import Set

from smtplib import SMTP

from email.MIMEBase import MIMEBase

from email.MIMEMultipart import MIMEMultipart

from dstring import dstring, safedict

from re import compile, findall

from urllib import urlopen



def my_print(*s):

	print datetime.now().strftime('%H:%M:%S'), ' '.join([str(x) for x in s])



def parse_time(s):

	struct = list(strptime(s, '%H:%M'))

	struct[0]=1970 #can't have negative seconds since the epoch

	return datetime.fromtimestamp(mktime(struct))



class MinyanMessage(MIMEMultipart, object):

	def __init__(self, yidden):

		super(MinyanMessage, self).__init__('alternative')

		self.yidden = Set(yidden)

		self.plain = MIMEBase('text', 'plain')

		self.html = MIMEBase('text', 'html')

		self.attach(self.plain)

		self.attach(self.html)

		self['From'] = 'me'

		self['To'] = self['Subject'] = '' #set for each recpient



	def lost_it(self, accepters=(), decliners=(), last_call=False):

		self.recipients = self.yidden - Set(decliners)

		self.text=['we need $NEED_COUNT for a minyan since someone cancelled', FORM]

		if last_call:

			self.last_call_adjust_text()

		self.send()



	def got_it(self, accepters=(), decliners=()):

		self.recipients = accepters

		self.text=['we have a minyan!', '']

		self.send()



	def not_got_it_yet(self, accepters=(), decliners=(), last_call=False):

		self.recipients = self.yidden - Set(accepters) - Set(decliners)

		self.text=['we need $NEED_COUNT for a minayn', FORM]

		if last_call:

			self.last_call_adjust_text()

		self.send()



	def cancelled(self, accepters=(), decliners=()):

		self.recipients = Set(accepters)

		self.text=['there will be NO minyan today', 'Only $HAVE_COUNT people signed up.']

		self.send()



	def last_call_adjust_text(self):

		if True:

			self.text[0]='LAST CALL! '+self.text[0]

			self.text[1]='If people don\'t sign up by $CUTOFF_TIME, there won\'t be a minyan today.'



	def send(self):

		s = SMTP()

		expand = safedict({'HAVE_COUNT':self.count, 'NEED_COUNT':self.count < 10 and 10-self.count or 0, 'CUTOFF_TIME':CUTOFF_TIME, 'TALLY_URL':TALLY_URL})

		s.connect()

		for recipient in (self.recipients):

			my_print('Mailing', recipient)

			expand['EMAIL']=recipient

			self.replace_header('To', recipient)

			self.replace_header('Subject', dstring(self.text[0]) % expand)

			self.plain.set_payload(dstring(self.text[1]) % expand)

			self.html.set_payload(dstring(self.text[1]) % expand)

			s.sendmail(self['From'], recipient, self.as_string())

		s.close()



re = compile(r'[\w\.\-+=!%]+?@[\w\.\-+=!%]+')

late_time = parse_time(LATE_TIME).time()

cutoff_time = parse_time(CUTOFF_TIME).time()

yidden_file = open(YIDDEN_FILE)

message = MinyanMessage([line[:-1] for line in yidden_file])

while True:

	decliners = ()

	page = urlopen(TALLY_URL, proxies=PROXY)

	page_contents = ''.join(page.read())

	accepters = re.findall(page_contents)

	message.count = count = len(accepters)



	if count == 0:

		my_print('No one has signed up for minyan yet.')

		continue



	previous_count = 0

	try:

		if date.fromtimestamp(getmtime(PREVIOUS_COUNT_FILE)) == date.fromtimestamp(time()):

			previous_count_file = open(PREVIOUS_COUNT_FILE)

			previous_count = int(previous_count_file.read())

	except:

		pass

	previous_count_file = open(PREVIOUS_COUNT_FILE, 'w')

	previous_count_file.write(str(count))

	previous_count_file.close()

	

	now = datetime.now()

	if now.time() > cutoff_time:

		if count < 10:

			message.cancelled(accepters)

		elif previous_count < 10:

			message.got_it(accepters)

		break



	interval_to_sleep = now.time() < late_time and INTERVAL or LATE_INTERVAL

	if (now+timedelta(minutes=interval_to_sleep)).time() > cutoff_time:

		is_last_call=True

	else:

		is_last_call=False



	if previous_count >= 10 and count < 10:

		message.lost_it(accepters, decliners, last_call=is_last_call)

	elif previous_count < 10 and count >= 10:

		message.got_it(accepters, decliners)

	elif previous_count < 10 and count < 10:

		message.not_got_it_yet(accepters, decliners, last_call=is_last_call)



	my_print('Sleeping for', interval_to_sleep, 'minutes...')

	sleep(interval_to_sleep*60)

Python Warts

Constructors in default arguments are only evaluated once
Nobody uses super.__init__() properly
It's weird to have a single immutable collection type (tuple)
It's confusing having both re.match and re.search
iterable strings cause hard to find bugs, when list(mystring) would be sufficient
Building a single element tuple requires a trailing comma
Exceptions should go in a namespace
There's no true ternary operator
The super() call requires specifying the class (and self)
Having both staticmethods and classmethods is confusing
There's currently no way to limit extreme dynamic behavior
Regular functions should obviate "unbound methods"
"lamba" and "def" use different syntaxes
Iterators aren't used enough
Comparing different types returns false instead of raising an exception
print should be a builtin function, not a statement
Overriding all operators is not yet supported, would allow LINQ alike

Many of these are going to be changed as described in python3000 pep.

Tuesday, June 29, 2004

Wednesday, June 23, 2004

Object Oriented Design

Design Principles and Design Patterns is a good paper. This entry summarizes the "Principles" part of that paper.

Object oriented programming is all about using old code to do new things. It should be easy to add new functionality without breaking the old. It should also be easy for other projects to reuse code without a lot of extra effort.

This a problem of dependency management, which may be solved using encapsulation. Encapsulation enables specifying and reducing dependencies between objects.

If code is properly modularized, its behavior can be changed by extending it instead of modifying it. (This is referred to as the "Open Closed Principle".) If code isn't modified, then code doesn't break.

When does it make sense to implement a particular abstraction, or extend a particular class? The new class should be usable anywhere the abstraction (or parent class) occurs. This is the "Liskov Substitution Principle". You might think, for example, that a Square would be a natural subclass of a Rectange. However, this potentially violates Liskov. Since the Square is more contrained than the Rectange, it can't necessarily be used in every place that a Rectangle is used. (Presumably, in java you could make it right by declaring that the superclass setLength() method may throw ReadOnlyWriteAttemptException.)

It's impossible to structure code in such a way that extension can accomplish any possible change, so the designer must anticipate the most desirable changes. It helps to keep abstractions as small as possible.

Monday, June 21, 2004

Semweb and Presentation

Some leaders of the movement think that bootstrapping is the greatest remaining obstacle to the semweb. If that's true, then why is everyone neglecting the greatest single obstacle to the bootstrapping process... ?

Why isn't there any easy-to-use framework for presenting RDF on the web? Shouldn't the semweb be able to achieve at least some of the actual results of the existing web? Shouldn't it support all the comfortable old web data, in addition to the annotations, agents, and allegories?

Here's an example: I maintain a site that lists the synagogues in my neighborhood. It has a page for each synagogue and it has schedules of events. Many synagogues have similar pages. Now these web pages are going to be the most popular representations of the data for a long time to come, but I'd jump at the chance to run the site using RDF.

All I need is a simple way to maintain my simple, static web presentation. Even before there's a standard synagogue ontology, and before I can write code to graph the times that congregations across the country begin to pray every morning, I'd use RDF. Maybe that'd even give me an easy way to list the schedule for the whole neighborhood on one page, and for each individual synagogue on its own page.

I don't want XSL, and I don't want a full-blown content management system. I want a simple way to start using RDF to back my web sites.

Minimal Semantic Web

The Semantic Web project is an effort to remove arbitrary and wasteful barriers between applications. We make it possible for them to share some structure, and have partial understanding of data created elsewhere.

Wednesday, June 16, 2004

Java Constructors Suck

Subclasses' constructors can't catch or override exceptions thrown by Superclasses' constructors
Subclasses must explicitly implement each constructor of their Superclasses, even if they don't want to override
Interfaces can't specify constructors
Reflection doesn't expose constructor parameter names, and there is no language support for describing correspondence between constructor parameters and getter/setter attributes

Tuesday, June 15, 2004

Fedora Core 2

Beautiful. So beautiful that my list of complaints is short:

gnome keyboard switcher applet doesn't work
up2date is broken (though command line "yum" works well)
the "Run Application" dialog can't be dismissed by pressing the escape key
still bundles mozilla instead of firefox
laptop power management isn't ready for prime time
no ntfs support out of the box
poor multimedia out of the box
volume buttons and trackpad scroller don't work

The first three only happenned on my laptop, but worked fine on my desktop.

By beautiful, I mean fast, smooth, and consistent, with important applications working out of the box (mozilla, evolution, and openoffice). And it has lovely hebrew fonts.

Monday, June 07, 2004

Minyan Sign-in

There've probably already been a bunch of implementations of minyan sign-in web pages. They're useful when you don't always get a minyan, and you don't want to have nine people show up and wait for no reason. They're less useful because no one actually wants to sign-in.

A couple of features I understood from the beginning: it should be as a easy as possible to sign-in (with cookies for example), and people should be able to withdraw their commitments.

I just thought of a major improvement:

sign-in ids should be email addresses
after a specified time, addresses that have not signed-in should get periodically reminded via email
whenever a minyan is achieved or lost, addresses that have signed-in should get emailed

So there are two incentives to sign in:

you get notified about whether there'll be a minyan
you don't get annoying reminder emails

Would only one of these incentives be sufficient? Neither is excessively onerous because:

if you really hate email reminders, you can just go to the site and mark yourself "can't make it" even if maybe you can.
if you want to the know the status of the minyan, you can view it directly on the site

Changes to minyan status probably shouldn't be mailed out immediately (at least earlier in the day) because the status could naturally change a lot on its own. For example, if ten people sign in at 10am and one drops out at 10:15, you have a reasonable chance of getting a replacement by 11.

In order to make it easiest to sign in, the form should also be inlined in the reminder emails.

Tuesday, May 25, 2004

Horrible SQL

Get the aggregate of one column, but only if another column has a null

select !count(stopped) as continuously, min(started) from someTable where ownerId = ? having count(*) <> count(stopped)

Sort reverse chronologically, but with nulls at the beginning

select * from someTable order by if(dateField,dateField,"9999-12-31") desc

Delicious

http://del.icio.us/ is incredibly cool.

Thursday, May 20, 2004

Presentation Done Right

WYSIWYG is the right way to do presentation (visual rendering of data). This stuff does not belong in code. That a given widget is five feet tall and bright pink does not belong in code. It does not belong in a specially formatted template. It doesn't even belong in a property sheet. It belongs in a directly presentable document, which is preferably maintained with a WYSIWYG tool. Once this tool exists, there is zero advantage to doing it any other way.

This is a solution for at least three classes of presentation: web pages, print, and desktop applications. This is the major innovation of XUL and XAML. The gui builder doesn't generate code directly, but generates a document which contains only presentation information.

The right way to do it is with Push MVC. Push MVC means that the legal static presentation documents are manipulated by external code. Pull MVC means code is embedded in pseudo-documents, and it's a broken concept. Even if the WYSIWYG tool can be prevented from choking on or breaking the embedded code, it obscures the presentation, and it potentially confuses the people who work on the presentation.

The code itself should generate template documents for each presentation element, framed by example presentation to show what the individual elements will look like in context. For HTML at least, the template links could be generated to work properly within the template.

It would nice if there were a simple language for specifying bindings between presentation documents and data documents. For simple cases, actual code need not be necessary at all. (For complex cases, not only is code necessary, but an actual presentation api is also needed. For example, you may want a widget to change colors based on system status.)

Except for desktop applications, we're already close to having a very sweet WYSIWYG tool. Openoffice has a beautiful format which largely uses standard sub-formats for presentation (SVG, XSL:FO). The problem is that it can't yet export to a purely standard format document. The advantage of the purely standard format document, of course, is that it can be transformed to html or pdf without launching the full openoffice application.

XMLC appears to be the only implementation of these ideas so far. Zope page templates don't seem to have it right; they have a lot of cruft in the presentation document. If nothing else, this confuses people about the role of that document.

Monday, May 10, 2004

Old friends and binary subscriptions

Blogs should be a great medium for keeping up with old friends. For those friends that you don't have time to correspond with, your blog gives a nice window into what you're up to.

It would be nice if subscriptions weren't so black-and-white. Currently subscribing to a blog is a relatively large commitment. If I find it boring, I don't want to have to always explicitly dismiss its unread entries. Blog viewers should sort in order of interest, and automatically expire old entries which are not ranked sufficiently interesting.

This is an especially good idea because it creates a practical incentive for ratings.

Friday, May 07, 2004

Gubbish

Danny Ayers is probably referring to "gubbled", from Martian Time-Slip. I wonder if blogspot or weblogs.com automatically sends something like a trackback.

Thursday, May 06, 2004

Another cute feature

Your email client's "attach" button would automatically upload to the web, and insert a link to the uploaded file (instead of actually attaching it).

The web upload area, username, password, etc. should only need to be configured once, just like other account settings. Conceivably, the server for that upload area could support webdav or deltav, to enable your recipients to make revisions to the "attachment" that you sent them.

I had this idea last year, though Jon Udell wrote about it recently.

Cute authoring feature

How about automatically substituting a typed phrase with a corresponding bookmark link?

For example: I have a bookmark labelled "grandma's delicious cookies". Wouldn't it be cool if when I typed "grandma's delicious cookies" (in a blog, email, or im), the os would insert <a href=http://whatever>grandma's delicious cookies</a>? Even better if "grandma's cookies" would get the same treatment.

Introduction

Hello world. I'm interested in the semantic web and programming languages. I write java for a large financial institution. I'm an orthodox jew. I'm a young husband and father. That's in descending order of expected frequency of posting :). I am starting a blog because apparently that's where the conversation's at. Nice to meet you.

I'm using blogger because a proper semantic blogging tool isn't available yet, along the lines of syncato. Though blogger is pleasantly easy to set up and use. It would be a great vehicle to introduce the blogging world to some semweb features, for example, a nice quoting mechanism (like Phil Windley's bookmarklet). I'm writing pseudononymously because the transparent society has not yet arrived.

It's odd how far we are from the semantic web vision. There isn't even a standard rdf query language, nor a standard way to serve up rdf on the internet. Could much be gained from separating the w3 research effort from the w3 standards effort?