Friday, May 19, 2017

DDD Validation

How should we implement validation for Domain Driven Design?

Our first thought is to put validation in the entity. After all, entities are in charge of maintaining their invariants. They are the heart of our business logic, and validation can also be pretty fundamental to our business logic.

One disadvantage of validating in the entity is that the entity can grow too large. Another disadvantage is that validations often require access to services, so putting those in the entity is not Clean Architecture. The third disadvantage is that entity methods do not naturally provide a powerful enough API. Throwing exceptions does not reasonably handle results for more than a single validation. It's too easy for calling code to neglect to properly call pairs of methods for validation or to check method return values, and these calling conventions detract from the proper focus of an entity.

E.g.

class Creature {
 public void eat(Snack v) {...} //invariant maintained using types
 private void setBloodSugarLevel(int i) {...} //invariant maintained privately
 public void eatIfValid1(Object o) throws RuntimeException() {...} //no
 public void eatIfValid2(Object o) throws EatingException() {...} //no
 public ValidationResults eatIfValid3(Object o) {...} //no
 public ValidationResults validateEat(Object o) {...} //no
}

The rest of this post describes a different approach to validation, which solves these problems.

We code each validation in a class by itself, thereby satisfying the Single Reponsibility Principle. All the validations of an object should implement a common interface. Two interfaces are better than one here; use one interface for indicating that the object is invalid, and another for providing information regarding how or why it's invalid (SRP again). Not only does this help share code for generating validation results, it also causes your code to be cleaner for the cases in which the result is specific to the validation.

E.g.

@Singleton
class ComfySnackValidation implements Predicate, Function {
 @Inject
 WeatherService weather;

 public boolean test(Snack snack) {
  int temperature = weather.getCurrentTemperatureInFarenheit();
  if (temperature < 68 || 78 < temperature)
   return true;
 }

 public ValidationResult apply(Snack snack) {
  return new ValidationResult(getClass().getSimpleName());
 }
}

There are two important aspects to this approach:
1) we validate whole objects and not individual method calls, and
2) we allow creating invalid objects.

Validating anything other than whole objects requires one of inelegant APIs mentioned above. Validating only whole objects enables us to leverage the type checker, as we'll see in the next post. The objects that we validate may be entities or value objects. They may be "command objects", that exist solely to serve as arguments to a single method. Often, the object needs a reference to another object which is already valid and persisted. This is fine, so long as nothing in the persistent object graph yet refers back to the new object, the object which is not yet known to be valid.

Creating invalid objects is especially compelling in Java, which doesn't yet support named parameters, and for which entity builders can be challenging. Even in languages which do support named parameters, we often want to use the actual object before we know it's valid, consulting it in defaulting and validation logic. We may even want to publish invalid objects, and it’s better to not have two different code paths for publishing the same fields.

We can achieve “correctness by construction”; there should be no reasonable way to call the domain incorrectly. We can achieve this without the entities having to know about each validation. The essence of the design is that a factory injects a collection of validating services into the object to be validated.

e.g.

@Singleton
public class SnackFactory {
  private Validator validator = new Validator();

  @Inject setComfyValidation(ComfySnackValidation v) {
    validator.add(v);
  }

  ...other validations to inject...

  public Snack create() {
    return new Snack(validator);
  }
}

With a small generic ValidatorImpl, the boilerplate that we need in the validated object is small:

e.g.

class SnackImpl implements Snack {
 private Validator validator;

 public Snack(Validator validator) {
  this.validator = validator;
 }

 public Snack validate(ValidationResults results) {
  return validator.validate(this, results);
 }
}

Here is an example of a generic validator to support.

Next post will discuss how the type checking works.

Thursday, May 18, 2017

DDD Entities

In Domain Driven Design, it's all about the entities.

Entities are the things in your software users' mental model that change.

In Clean Architecture, your entities are independent of all of the rest of your software.

All the rest of your software is defined mostly in relation to entities. Repositories are collections of entities. And nothing else in DDD software changes over time.

Entities are the genuine objects of Object Oriented Programming.

Your software should only change entities by calling their methods, and never by directly modifying their internal state.

E.g. animal.eat(grass) and not animal.setBloodSugarLevel(100)

Thursday, May 11, 2017

High level code, great performance

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/11/destination-passing-style-1.pdf

Exciting!

Sunday, March 19, 2017

Slicing the cake and different products

The concept of "slicing the cake" is one of the most important lessons to come out of the agile movement. People consistently neglect to apply it across products...

One solution is to have a single developer make all the changes across products. This makes sense if all the products use the same infrastructure, or all have high standards of documentation and devops. E.g. it doesn't work well if different products have different build processes that all require manual intervention. When we reduce barriers to entry this way, "ownership" of individual products might then be meaningful only for code reviews and maintaining long term health of each code base.

The only other solution is to have developers on each of the products all working together to integrate early. If your sprints are two weeks long, each developer gets one week to build an initial implementation, and must be available to integrate immediately on the first day of the second week. Everyone should expect developers to refactor their initial implementations afterwards.

Sources of technical debt

All coding smells get worse over time until they are corrected.

For example, an awkward abstraction might not seem so bad when it is introduced, but as more features are added, it gets increasingly brittle and creates more bugs.

In practical software development these are the most common sources of technical debt:

YAGNI (You aren't gonna need it.)

People build things that don't end up being needed at all, or more insidiously people build things that have an overall cost which exceeds their benefit.

Adapting

Existing code is organized into two parts A and B, and a new use case comes along that needs mostly B. That is, the interface to use B is awkward. Rather than improve that interface, people have their code C start using B via an extra piece of code, a B-C adapter.

Adapters are necessary when the adapted code is owned by a different team, but in that case you'd hope that the interface is well designed in the first place, or at least the integration is part of the application's job. When all the code is owned by a single team, adapters are just debt.

Special casing

A change is desired for a single specific case. The change could apply more generally, but it isn't urgent. People add new code that runs just for the specific case, because they are afraid.

This source of technical debt is particularly tempting. And sometimes it's hard to distinguish from properly avoiding YAGNI. Just as with all refactoring, good automated tests are essential.

Wednesday, February 08, 2017

Schmoperties

Schmoperties is the best two java configuration APIs.

Monday, January 16, 2017

In praise of kafka

Messaging is great because it reduces coupling.

Kakfa does it even better.

A message consumer can come up and work just fine after being down all week.

Cool unhyped languages

Pony - the best of low level and high level
Zig - more pragmatic than c
Lamdu - types are friendly and easy
Crema - program without turing completeness

Monday, October 10, 2016

Easier git squash

The traditional way to squash multiple commits into a single commit is to use interactive rebase.

That way involves a lot of merging work, one merge for each commit.

You can avoid all of that work by instead using the git checkout branch -- file usage.

E.g. you'd like to merge feature/foo onto master using a single commit:

git fetch origin master
git checkout -b feature/foo-squashed origin/master
rm -rf *
git checkout feature/foo -- .
git commit -a
git push origin feature/foo-squashed:feature/foo

Monday, July 04, 2016

Cool java tools

Power

Dagger
Mapstruct
Autovalue

Safety

Error-prone
Pure4j

Writing your own

Javapoet
Autoservice

Release workflow

This revision control workflow is designed keep you sane.

There are just two types of long lived branches:
  • master - This branch lives forever, and all changes are eventually merged to it.
  • release branches - These are branched from master for each major release candidate.

Master

New features are implemented against master, by merging feature branches into it. Feature branches should be as short lived as possible, because keeping them up-to-date is expensive. If your feature can't be implemented within a single sprint, consider branching by abstraction.

Release Branches

Release branches are also known as "stabilization branches". Bugs that should be fixed for a release are first merged into the corresponding release branch, and then merged forward to all later releases and then to master.

Change as little code in release branches as possible, because merging to code that has later changed is expensive.

Each published build of a release branch should be tagged so that it can be easily identified and ordered.

Example

In June, we branch master to release/1, and start testing it.
In July, we branch master to release/2, and start testing it.
In August, we find a bug in release/1.
We branch release/1 to bugfix/555, and fix the bug on the bugfix/555 branch.
We merge bugfix/555 to release/1, to release/2, and to master.

Feature development in July and later does not create risk for release/1, and even bugfixes for release/2 do not create risk for release/1.

Tracking Bugfixes

It is straightforward to automatically merge most changes from older releases to newer, and to report regarding changes that have not yet been merged. We want to avoid regressing a bugfix in a later release just because we forgot to merge.

Using the merge history to keep track of what fixes have been applied to releases sometimes requires doing a "trivial merge". Even when there is no change to be made to the later branch, we still have to merge, to inform the revision control software (e.g. git) that the bug is fixed in the later branch.

Cherry picking

If we merge a bugfix into a release branch and only realize afterwards that it should've been fixed in an earlier release, we must cherry pick, and trivial merge as described in the previous section.

Hotfix Branches

It is good to decouple bug fixing from the choice of exactly which build is deployed into production, especially when there are multiple production environments that have different risk profiles. Therefore, the production tag could be different from the tip of the release branch. When this happens and we want to make an urgent fix for that production environment, we don't want to jump to the tip of the release branch. We haven't yet tested all the changes in the release branch, so we hotfix branch from the tag that we have tested.

Example

In September, we deploy the tag 1.2 of release/1 to the London production environment.
In October, we make five bugfixes to release/1 in preparation for its 1.7 New York release.
In November, we discover a critical global bug in release/1.
We branch tag 1.2 to hotfix/1.2-LN, and fix the bug on the hotfix/1.2-LN branch.
We deploy tag 1.2-LN1 of hotfix/1.2-LN to the London production environment.
When we have leisure, we merge hotfix/1.2-LN, to release/1 etc.

Advanced Topics

Gitflow and ProdFlow

The "gitflow" workflow from nvie has two problems: it confusingly changes the meaning of the "master" branch, and it doesn't clearly justify the cost of an additional long lived branch. Both problems could be resolved by tweaking the workflow to support multiple production environments. This is valuable if you actually need to support multiple production environments :). Let's call this improved workflow "prodflow".

Prodflow has release branches and master-as-trunk. Instead of one branch for release history, it uses a branch for each production environment, prefixed with "prod/". So when we deploy release/1 to production in London, we merge it to prod/LN.

An advantage of this approach is that it simplifies hotfixes. Neither gitflow nor prodflow really need hotfix branches; you can just merge bugfixes first into the prod branch and then into release branches and master. We don't need to keep track of the version in production.

Instead of: "
We branch tag 1.2 to hotfix/1.2-LN.
We branch hotfix/1.2-LN to bugfix/777...
" in the example above, that'd be just "We branch prod/LN to bugfix/777...".

Deserializing Merges

When there is a conflict, merging bugfixes from one release branch to the next can be a pain. Developers might not do it quickly, and this blocks merging subsequent bugfixes. We accumulate a backlog of unmerged changes, and it becomes increasingly onerous to work it down, especially with proper code review.

The solution is to "deserialize" merges. When bugfixes are made to different parts of the code, one needn't be blocked by the other. We can convey this to git using "rebase --onto", so that the parent commit of the bugfix branch is a commit that is already present on all the long lived branches. If we didn't rebase, ordinarily the parent commit of the bugfix branch would be the tip of its release branch, and the author of that commit might be procrastinating her merge.

Example

We merge bugfix/888 to release/1 but not to master.
Now we'd like to merge bugfix/999 to release/1 and master without being blocked by bugfix/888.
Before merging to release/1, we run: git rebase --onto origin/master bugfix/999
And it Just Works!

Conclusion

So that's the git release voodoo I learned over the last couple of years. I hope you find it useful!

Wednesday, November 18, 2015

Posting cljs to get around an unkind firewall.

Sunday, December 28, 2014

Extensible Software and Dependency Injection

tl;dr For Dependency Injection of a collection of implementations of an interface, just inject the concrete classes of those implementations.

Supporting change is a big challenge in programming. We want to make the most likely kinds of change quick and easy. How do we do that with Dependency Injection?

So if you have code that does five things that all have the same pattern, you'd like to be able to easily add a sixth. Common examples are programs pulling from multiple sources of data, and programs performing multiple validations. The technical name for this kind of thing is the Open Closed Principle.

If your five things are complicated, their code might organically spread out over your system...
setup();
doWork();
cleanup();

void setup() {
    setupForDataSourceA();
    setupForDataSourceB();
    ...
}

void doWork() {
    workForDataSourceA();
    workForDataSourceB();
    ...
}

...
The idiomatic way to handle this in java is with interfaces:
interface DataSource {
    void setup();
    void doWork();
    void cleanup();
}
So each data source implements the interface, and many parts of the general code loop over a set of instances of the interface.

for (DataSource datasource : datasources) {
    datasource.setup();
}
for (DataSource datasource : datasources) {
    datasource.doWork();
}
...
If you're using Guice for your dependency injection, when you hear the word "set" you might be tempted to use multibindings.

Or you might think, "wouldn't it be cool to add a new datasource without having to change any existing code". You could write a whiz bang code generation framework that automatically wires up implementations, either purely because they implement the interface or when they are additionally annotated.

Keep it simple sweety! Remember the original goal: making software easy to change. When the software is too clever, it becomes harder to change over time, especially as multiple programmers work on it.

There's a simple way to do it in Guice:
@Singleton
class DataSources {
    Set datasources = new HashSet();

    public Set getDataSources() {
        return Colletions.unmodifiableSet(set);
    }

    @Inject void setDataSourceA(DataSourceA datasource) {
        datasources.add(datasource);
    }

    @Inject void setDataSourceB(DataSourceB datasource) {
        datasources.add(datasource);
    }

    ...
}
This works without any additional binding configuration, using Guice's JustInTimeBindings support for eligible constructors.

This wiring is vanilla java, and surprisingly not Guice's embedded domain specific language. These implementations aren't typical injected dependencies, because the design of the application requires that you run lots of them.

Don't be scared by the idea of a concrete class, because the framework code still only ever relies on the interface. The DataSources class above has the same testing burden as Guice modules. If you don't unit test your Guice modules, don't unit these classes either.

In addition to being more simple than multibindings, this pattern is much more explicit because the full configuration is in one place. Multibindings allow multiple Guice modules to add to the one binding.

It's one of the few cases where setter injection is best, though it works fine with field and constructor injection too. Setter injection enables you to add new instances by editing a single location in the composing class. It also allows you to simply subclass in order to compose an additional group of instances.

Thursday, December 26, 2013

Test Driven Development (TDD) recommends first writing a failing test.

Continuous Integration recommends committing early and often.

How do you do both and not break the build?

How do you distinguish between test failures associated with regressions and test failures associated with unfinished development?

Using Junit @Rules!

Instead of just disabling a test using the Junit standard @Ignore annotation, you can annotate a new test using @NotImplemented. The result of these tests will be inverted; the automated build's junit test will succeed if and only if the actual test logic fails.

That way, when the application functionality is still incomplete the actual failing test will not break the build. When the functionality is ready, the inverted test will start breaking the build, so that you don't forget to enable it.

And the @NotImplemented annotation provides a machine-checked way to track known issues.

Here's the implementation:

import org.junit.rules.TestRule;
import org.junit.runner.Description;
import org.junit.runners.model.Statement;
import java.lang.annotation.*;

/*
 * To use in a testcase, include a line:
 *
        @Rule
        public NotImplemented.MustFail notImplementedRule = new NotImplemented.MustFail();
 *
 * and annotate individual test methods:
 *
        @NotImplemented @Test
        public void someTest() {
                ...
        }
 *
 */

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
public @interface NotImplemented {
        String value() default ""; //to enable grouping tests; doesn't affect tests
        boolean invertTest() default true;

        public static class MustFail implements TestRule {
                @Override
                public Statement apply(final Statement base, Description description) {
                        NotImplemented annotation = description.getAnnotation(NotImplemented.class);
                        if (annotation == null || !annotation.invertTest()) {
                                return base;
                        } else {
                                return new Statement() {
                                        @Override
                                        public void evaluate() throws Throwable {
                                                try {
                                                        base.evaluate();
                                                } catch (AssertionError e) {
                                                        return;
                                                }
                                                throw new AssertionError();
                                        }
                                };
                        }
                }
        }
}

Sunday, July 18, 2010

Limited video for kids

Few parents can completely resist employing the video-as-babysitter.

Parents that use linux can easily have their computer play just one movie with the following script:

if [ "$1" ]
then
totem --fullscreen "$@" &
fi
xtrlock

It starts playing the video(s) you want, but then prevents the kids from playing other videos or engaging in other unsupervised fiddling with the computer.

xtrlock needs to be installed, easily for example from the standard Ubuntu Software Center in Ubuntu 10.04.

You can create a Launcher in the panel for this script and a %F paramter, and drag videos to it from the File Browser.

Sunday, July 05, 2009

Google Strategy

Google has three main strategic product categories.

Google has login-based services, like Gmail. These are most like other companies' internet services. They also have the most lock-in effect, though Google expends some effort to reduce it (for example allowing downloading and forwarding away email messages). These services enrich Google search data with user identity data.

Google has internet growth services, like Chrome, Android, and News. In general, companies would like to reduce "substitutes" and increase "complements". Since Google has dominant marketshare of internet search, anything that increases internet use is a complement. Google can run ads on all of these services, and to varying degrees their users are more likely to be Google search users. Developing these services is also a little like Military Keynesianism. It gives Google engineers fun projects to work on, helping Google to get and keep top engineers, who help it to maintain its internet search dominance.

Google has the internet. Since Google is the king of search, it will vigorously defend the internet against closed competing networks like the new social networks Facebook, LinkedIn, and Twitter. At least three distinct Google projects, though they also occupy the second category, attempt to open up the social software world: Social Graph API, Open Social, and Wave. This makes Google "the good guys," because they'll always try to bring the fight to the open web, where they have the competitive advantage. Now, if Google didn't have a challenger, it might just release something into the first category (Orkut for example), but when they do have a competitor, everyone will end up being a lot better off. Wave promises to be a wonderful open technology, though it will put a lot of social software companies out of business.

Thursday, July 26, 2007

Temporal and Bitemporal in One Sentence

Temporal database support allow efficiently answering questions like "What was the state of this entry on Monday?", and bitemporal database support, questions like "What was the state of his entry on Monday, if I had asked on Tuesday?"

Tuesday, June 19, 2007

Distributed Revision Control Yields Centralized Advantages

The new distributed revision control systems (DRCS) have some important advantages for traditional use, apart from the ability to commit without an internet connection.

DRCS make code review much easier. Instead of using an entirely different system for managing the code review process, developers can check all of their un-reviewed changes into a branch, and merge changes as they get reviewed.

DRCS can support graduated continuous integration. In sufficiently large projects, continuous integration can cause frequent build breakage. Instead of having a single, volatile HEAD, subprojects can have their own independent integration branches, which they merge to the superproject integration branch at longer intervals.

DRCS support "lines of development", or working on multiple independent features. With traditional RCS, once a feature is committed it becomes part of the blob, and may not easily be dealt with as a separate identity.

The original motivation for DRCS was empowering decoupled open source contributors, so it's mildly ironic that this functionality should be so valuable for BigCo development as well.

Wednesday, May 30, 2007

Libertarian OS

What would an operating system look like if it were designed to maximize freedom of development? Could we support a web of code to match the existing web of documents?

It'd need:
  • A security framework
  • A dependency framework
  • No extra baggage
For security, programs would be extremely limited by default. They'd have no network, filesystem, or driver access. In order to do anything, they'd use APIs to
  • request reading or writing an existing file, selected by the user
  • request writing a new file
  • suggest a url to visit (optionally with post data).
This restricted behavior could safely allow running programs from even the least trusted sources. It's in the spirit of the "One laptop per child" project's revolutionary security model Bitfrost.

For dependencies, programs should specify urls (possibly self-certifying) of code that they depend on. All code could be cached locally or downloaded anew as desired. This is in the spirit of package management approaches without "side effects", like Nix and Zero Install.

Once there's a sufficiently powerful dependency framework, we need not hard code any dependency on "standard software" like programming languages or gui toolkits. Programs could be transported in llvm bytecode.

Implicitly trusted programs could be subject to phishing attacks just as websites currently are, so it would be desirable to have some sort of petnames scheme, as well as a spoof-resistant UI for distinguishing programs.

Monday, March 20, 2006

Dinosaurs

My daughter, 4 years old: Daddy, are dinosaurs kosher?
Me: No.
My daughter: Not even little purple ones?

Monday, February 13, 2006

"Model" in MVC

The "Model" in Model-View-Controller architecture does not refer to the application data model. It refers to the user interface model underlying each widget.

Think of a text box that accepts date values. When the user types "2006-" and then pauses, the application doesn't yet have a date, so it doesn't have anything to put in its data model. The string "2006-" isn't stored somewhere in the view; it's stored in the model that backs the UI.

To integrate your UI and your data model, you've got to use data binding code, like the jgoodies derived data binding code. It would seem that MVC is more about UI framework implementation than interface.

This explains why it's so cumbersome to switch between GUI widgets that would appear to represent the same data, like a select box and a group of radio buttons. It'd actually be against the MVC philosophy to support this, because it'd put artificial constraints on the respective widget models.

Monday, November 28, 2005

Consistent Types

There's a problem with inconsistent downcalling, which I've been hankering after for a bit, and which C# supports using the "new" keyword.

The problem is that extension classes aren't really subtypes of extended classes anymore. They violate the Liskov Substitution Principle.

Interfaces for encapsulation don't have this problem.

(Though it'd be great if all languages had something like c#'s "override" keyword :).)

UPDATE: I was mistaken. Different methods are called on an object in different static "contexts". For example:

((ArtisticCowboy)joe).draw() runs one method, and
((Cowboy)joe).draw() runs another.

This is what C# does, but I don't know if it's wise.

Thursday, November 17, 2005

Future Refactoring UI

Refactoring tools have a good case of programmer automorphism. These tools present silly lists of operations like "decompose conditional" and "encapsulate downcast."

Really, automatic refactoring means the computer automatically makes all appropriate changes to accompany your change. It's all about moving and renaming. The tools should get rid of their complicated interfaces, and just support a mode "don't let me introduce any errors."

Simple changes are easy to spot, but moves need to be done with Cut&Paste. For example, cutting from one place and pasting to another should fix up the pasted code to work in the new location. It should also fix up callers to the code at the original location so that they now call to the new location. The user shouldn't have to specify what type of code is moving, or what type of move it's making.

If this could be made fast enough, programmers might want to always work this way. They'd just have to get in the habit of defining before using, and deleting usages before deleting definitions.

Wednesday, November 16, 2005

Everyone is Like Me, and bad UI

User interfaces suck because programmers think users care about programs.

Users do not care about programs. They think, "Just do it," and, "Get out of my face." Programmers love their programs, so they think everyone else does too. For example, they even write programs that hassle users to make implementation decisions.

In general, every person thinks that everyone is just like himself. Worse, every person thinks that everyone is like himself right now. In the linked example, even the developers won't care about the implementation a year after they've coded.

There is a word for this, though it seems seldom used: automorphism

Sunday, October 30, 2005

Interfaces for Encapsulation

The python people haven't succinctly explained why python doesn't need "private", "protected", and "public". In one sentence, python doesn't need visibility modifiers becase...

With sufficiently good refactoring support, encapsulation decisions don't affect code evolution.

Especially according to the XP design principle of collective code ownership, it's easy to change callers of an interface when the interface changes. Though people reading the code will need to understand the changes, that's far more efficient than making artificial barriers to reuse. Encapsulation is powerful; forced encapsulation is onerous.

This is true for a single organization, no matter how large its code base. Once code is "published" however, all bets are off. Code obviously can't be refactored across administrative boundaries. That being the case, we should care about "published" instead of "public".

For encapsulation of published code, separate interfaces are much more useful than visibility keywords. A class with a single public method is better represented by a separate interface containing a single method. When that interface is published, it never changes. In Java for example, the old interface could be kept in a separate jar file, which clients can continue using even after extended versions become available.

More originally, we should allow extending code through an interface. Extending through an interface means that new code will never downcall to a method that isn't in the interface. This is an elegant solution to the fragile base class problem. (Someone recently showed me the cute new c# modifiers "new" and "override". They protect against accidental downcalls, but they don't protect against replacment of methods called by collaborating classes.)

Extending through an interface might also help with downcalling into unrelated methods with mixins and multiple inheritance. For example, to create an "artistic cowboy" with multiple inheritance, we'll inherit from "artist" and "cowboy". Though we want his artistic nature to be dominant, we don't want the cowboy method "drawAndShoot()" to downcall into the artist method "draw()". We can achieve this by extending artist through an interface that excludes "draw()".

Monday, October 10, 2005

DRM and the Little Guy

Maybe published work isn't best treated like property, but publishers will certainly fight to keep it that way. Physical media have been pretty good at proxying ownership of information. If general purpose computers can't do as well, then publishers will just adopt special purpose devices instead. Besides keeping consumption cumbersome and inefficient, this'll give binding power to corporations, but not to individuals.

A general purpose DRM system can have a significant impact on society. If we're going to allow our computers to enter into contracts for us, we need some safeguards to avoid creating an information dystopia. The restrictions that we allow publishers to place on consumers should themselves be restricted. Generally, it shouldn't be possible for publishers to artbitrarily revoke access to their work. The set of digitally managed "rights" should be standard, easy to understand, and when possible, it should not require an ongoing relationship with the publisher.

This task is more difficult from the technology standpoint, and it'll detract from the generality of the full DRM dream, but it could work.

Friday, October 07, 2005

Money and Politics

So long as the only means of engaging in political dialogue is through purchasing expensive television advertising, money will continue by one means or another to dominate American politics.
--Al Gore

Templates for Environments

(The more surprising points are below.) Templates are useful for maintaining different deploy environments. Any large system needs to have a painless way to run in different environments, at least one "production" environment and one non-production environment. Examples of things that change between environments are: machine names, port numbers, database names.

Manually switching code from one environment to another is tedious and error-prone, but not all tools can make it completely automatic in all cases (for example, enabling stored procedures to run against differently named databases). Theoretically, templates are also be useful for unifying environment configuration across languages, but they're really essential for languages that don't support the necessary abstraction.

Even on platforms that have the necessary support, environment management can get gnarly. For example, java has a great configuration mechanism in the form of "properties" and the improved "preferences". If different sets of property files are used, however, it's easy for the non-production property sets to get out-of-date. The solution is to ensure that only a small fraction of properties are environment specific, and put only those in a separate file.

Templates generally break tools for live editing. For example, you can't "alter table" in a test database and easily have that change apply to your template. Or use a gui wizard to maintain templatized xml files for your application servers. We'd like to be able to "round-trip" templates, to be able to automatically generate templates from live files, as well as generate live files from templates. For this to be easy, the templates must meet two requirements:
  1. there must be a one-to-one correspondence between template variables and the values in the environment used to generate templates.
  2. template values must be unique.
An exampe of #1 is that a test environment must have as many different machine names as there are in production, even if those machine names are just aliases. Otherwise, the test environment will lose information.

An example of #2 is that a machine can't be named "foo" because substituting a template variable like "@FOO_MACHINE@" will also substitute "buffoon" and "foot".

Otherwise, the template generation will have to get complicated.

Tuesday, August 09, 2005

Google Maps

Google maps should support saving and restoring locations.

It should also support taking the union of the sets of locations created by different API sites.

Wednesday, August 03, 2005

Mixin Requirements

Proper mixins require two funny features:
  • never down-calling into unrelated mixins
  • a mechanism for disambiguating constructor parameters
Example 1:
class artist:
def draw(self): ...
class cowboy:
def draw(self): ...
def drawAndShoot(): self.draw() ...
artisticCowboy = mixin(artist, cowboy)()
artisticCowboy.draw() should call artist.draw,
but artisticCowboy.drawAndShoot() should call cowboy.draw,
without needing to be explicitly specified

Example 2:
class file:
def __init__(self, root): ...
class plant:
def __init__(self, root): ...
we need some mechanism for specifying plant's and file's constructor parameters independently

UPDATE: Silly me, number one would be great for multiple inheritance also.

Tuesday, August 02, 2005

Improving the GUI Editor 2

A standard XML format for representing GUIs would be a tremendous boon.

It would solve the biggest problems with current GUI builders: round-trip-ability and lockin.

It would enable a single builder to easily support GUIs that run with different languages and widget sets, including HTML.

The standard should specify the least common denominator, as well as extra features and their fall-backs. It shouldn't specify bindings or fancy template support, but it should specify how builders maintain information for features that they don't handle.

Monday, July 25, 2005

Mixins instead of Traits

On reflection (no pun intended), I can't see what advantage Traits have over mixins in a dynamic language like python. Specifically, python doesn't seem to be stuck with a total ordering.

import copy
def mixin(*classes):
c=copy.copy(classes[0])
if len(classes) > 1:
c.__bases__ += classes[1:]
return c()

class a:
def f(self): return "a.f"
def g(self): return "a.g"
class b:
def f(self): return "b.f"
def g(self): return "b.g"
class c:
def g(self): return a.g(self)

obj=mixin(c, b, a)
assert(obj.f() == "b.f")
assert(obj.g() == "a.g")

Friday, July 22, 2005

Improve the GUI Editor

Beware the GUI Editor only for a little while. For one thing, Sun is finally fixing wysiwyg layout with GroupLayout and assorted swing changes, and demonstrated in netbeans. For most of Beware's other complaints, all we really need is good template support, just like what we've been using with html for a while.

Just like with html, users usually won't care about the api for manipulating their presentation, but that's OK. (The favored api for html is the javascript dom.)

Monday, July 18, 2005

Python Shell

In the quest for best re-use, I'd like my "shell" programming language to be my main programming language, and to just support some syntactic sugar and extra functionality for interactive use.

Windowing systems obsolesce job control, leaving tab completion, history, and prompting as the important shell functionality. Python's readline bindings can handle the first two.

LazyPython and IPython both allow ommitting parantheses, commas, and quotes around function parameters. They use standard python's sys.excepthook, so that they only extend python syntax that would've illegal anyway.

This has two problems. First, there are cases that don't work, like calling a method without parameters. Second and more importantly, excepthook still throws an exception, and so discards context. That means you can't do:for i in range(n): some_convenient_method i.

So in order to do it right, we'll have to hack the interpreter, and while we're at it, we could even support working directly with the input. For example, you type some_convenient_method and then a space, and the interpreter inserts a paranthesis. You always have legal python code. IPython prints the legal python code after you hit return, which isn't as nice.

Incidentally, until we have interfaces or inference, tab completion can only work with already instantiated objects. (For example you can't complete list().a to list().append.)

Wednesday, July 13, 2005

Electronic Paper

Should be on my link blog, but wow, Fujitsu is demonstrating electronic paper!

Highlights: as easy-to-read as paper, extremely low-power, flexible, color. Product to be available between one and two years (April 2006 to March 2007).

Friday, June 24, 2005

Is It You?

Rule of thumb:
if you have a bad experience with someone, it's just him.
if you have a bad experience with everyone, it's you.

Tuesday, June 21, 2005

Oppressive Corporate Firewalls

If you're behind an obnoxious corporate firewall, that doesn't allow any internet access except through a web proxy, you can still get out to your unix account.

Putty supports SSL tunnelling via SSL out-of-the-box:
  • on your server, add "Port 443" to your sshd_config
  • in putty new session, specify 443 in the port box, and
  • in its Connection/Proxy category, check HTTP and specify your proxy info

Monday, June 20, 2005

Perl Best Practices (for corporate coders)

In general, use perl for glue code and reports, not for business logic.

Perl code should be as simple as possible. That means:
  • Use as few language features as possible.
  • Use as few lines as possible, as long as they're readable.
  • Don't overdesign.
  • Don't try to write perl that looks like java.
Avoid rewriting functionality that is present in the standard library.

Separate functionality out into multiple small scripts, the smallest meaningful rerunnable units. For example, have one script handle generating a report and another script handle distributing it.

Always "use strict".

Avoid using $_ in loops except for regular expressions.

Avoid reversed flow control operators except for error handling ("doSomething or die()").

Avoid redundant comments.

Avoid overloading on context. (e.g. sub fu { wantarray() ? X() : Y() })

Avoid special symbol variables. Use the long versions and "use English" (e.g. "$INPUT_RECORD_SEPARATOR" instead of $/). Comment appropriately.

Avoid using shell tools (e.g "awk", "grep", and even "date" and "cp"). If perl can do it, do it in perl.

Prefer "my" over "local".

Put "my" declarations in the tightest possible scope.

Have users of modules explicitly import the tokens that they want (e.g. "use SomeModule qw( SomeFunc $SomVar )").

Avoid writing huge modules with lots of independent functionality; people are afraid to change them.

Avoid writing modules altogether, unless you're absolutely sure that your code will be reused.

If you're using references, have the first character of the expression reflect the type of the expression (e.g. use "$somearrayref->[3]" instead of "@$somearrayref[3]").

If your "configuration" is sufficiently complex (or unchanging) just define at the top of your script.

Wednesday, June 08, 2005

Next Generation Web

The next generation web will be all about abstractions for handling foreign code.

Of all the desirable features for a development platform, one of the most desirable is the ability to write and deploy code quickly and easily. This is how people explain the success of the web and web applications like the google suite.

The next generation web will protect users even from malicious code without requiring much greater responsibility. The new security will have to support three classes of features:
  • manage state
  • allow the code from one site to affect the user's experience of other sites (like greasemonkey for example)
  • clearly determine what site is responsible for a given interaction (to combat phishing for example)
Users will never be hassled "are you sure you want to install this extension?"; foreign code will have access to a powerful set of operations that yet poses no threat to the user. On the other hand, users will have to learn something of a new approach just to be able to understand what they'll be seeing.

Thursday, May 26, 2005

Another Solution to the Fragile Base Class Problem

Here's a simple python implementation of the solution to the fragile base class problem described in Modular Reasoning in the Presence of Subtyping, and more accessibly in John Cowan's blog. It uses classes in place of "divisions".
class sgd(type):
def __init__(cls, name, bases, dct):
overridden_bases = [ base for attr in dct.keys() for base in cls.__mro__ if hasattr(base, attr) and not attr.startswith('__') ]
required_methods = [ (base,m) for base in overridden_bases for m in base.__dict__ if m not in cls.__dict__ and not m.startswith('__') ]
if required_methods:
raise "unimplemented methods:", required_methods

super(sgd, cls).__init__(name, bases, dct)

class sgd_class(object):
__metaclass__ = sgd

and then:
>>> class a(sgd_class):
... def f(self): pass
... def g(self): pass
...
>>> class b(a): pass
...
>>> class c(a):
... def f(self): pass
... def g(self): pass
...
>>> class d(a):
... def f(self): pass
...
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "x.py", line 12, in __init__
if required_methods: raise "unimplemented methods:", required_methods
unimplemented methods:: [(<class '__main__.a'>, 'g')]
>>>

Monday, May 16, 2005

People Don't Read, and What To Do About It

Ever had someone flagrantly not read your email? Try putting the most important stuff at the top.

Not reading is a lot more prevalent than you think. Most people don't enjoy reading. Even those who do have too little time to read everything they'd like.

The solution is to write your emails in order of decreasing importance. Don't write in chronological order, or even in logical order. Employ the journalistic technique of "heads, decks, and leads" or headlines, sub-headlines, and leading paragraphs.

This is most important for messages to groups of people, and for documentation messages. In other situations, there isn't so much motivation to write more text than will be read.

(This is ironic as a blog post; those who see this post presumably do read. But this advice is for you! You don't realize how unusual you are.)

Friday, May 13, 2005

Exceptions Are Good For You

Languages with exceptions are good for you; they make you realize that your code can be interrupted any time.

Consider:
#read user password without displaying on screen
os.system("stty -echo")
print "Password:",
password = raw_input()
os.system("stty echo")
print
What happens if the user changes her mind, and kills the program? All subsequent typing in the shell will be invisible! Checking for errors won't help here.

The only difference in modern languages is that the rest of a program may continue to run even after a function has been interrupted. This creates a new kind of obligation, but not a big one:
  1. you already have to worry about external state without exceptions (above example)
  2. you don't have to worry about local function state, because the exception causes it to be thrown out
  3. you do now have to worry about non-local state
Composable memory transactions would relieve us of even #3; an exception would roll back all the changes.

An effects system would at least catch the omission; it'd flag a function that makes multiple write calls to objects not mentioned in a finally cleanup block.

In languages without exceptions, you have to use a signal handler in the above example; in languages with exceptions, you can use a finally clause for the stty echo.

(In the spirit of the recent exception buzz (Joel Spolsky's popularization of Raymond Chen's rant, and GvR's "resource allocation" work)

Friday, May 06, 2005

Click For Each Page: ACM Queue

Some sites have an obnoxious policy of requiring you to page through their content by clicking "next", "next" a bunch of times. Presumably they do this to better gauge interest in the content; the user cared enough to click to the end of the article. (Web browsers don't naturally support reporting back to the server how far the user scrolls in a page or how long a user has a page open (and focused).) Or they could just be incompetent.

To get around this, you can often click their button "Print This Article" or equivalent.

For some sites, you may have to visit the last section and then click "Print This Article" there. Cute, eh? This is how I read ACM Queue.

Capacity For Self Deception

Someone should start a broad catalog of memes, not just pop-culture memes. Google book search shows our "capacity for self deception" going back to the 19th century.

Though it's easier to demonstrate this for criminals, it also applies to the best of us.

Friday, April 29, 2005

We Need Core Data

Something like Apple's new Core Data data-model-framework is long overdue, but we need a standard, free, cross-platform implementation.

The hierarchical filesystem is not meeting our needs. For all of our documents, we want:
  • easy persistence
  • undo and redo
  • gui prototyping
  • revision control
  • transactions
  • search
  • annotation
(Core Data only provides the first few features.)

This new API can't handle behavior and stay language/platform independent. That is, it can't support the object oriented approach of only allowing access to state through member functions.

On the other hand, it would be nice to use regular oop languages to specify the model.

Thursday, April 21, 2005

Justification of Python Sequence Subscripts

Which notation should be used for slicing a subsequence of m through n items?
  1. myseq[ m : n ]
  2. myseq[ m : n+1 ]
  3. myseq[ m-1 : n ]
  4. myseq[ m-1 : n+1 ]
#3 and #4 are silly, because they're inconsistent with the natural way to refer to a single item myseq[m].

Perl uses #1, which is wrong because you can't iterate down to an empty subsequence with myseq[m:m]. Representing this with myseq[m:m-1] is clearly evil, and it breaks completely for m = 0.

The rationale for indices starting from 0 instead of 1 isn't as strong: that representing the whole list as myseq[0:len(myseq)] is "nicer" than representing it as myseq[1:len(myseq)+1]. Perl doesn't even have this rationale :).

Ironically, reverse indices in python do go from 1 to n+1, or rather from -1 to -(n+1). This is fortuitous, because representing reversed(myseq) with myseq[-1::-1] is nicer than representing it with myseq[0::-1].

inspired by a note from Dykstra

Wednesday, April 20, 2005

Type, Effect, and Inference

Effect systems are nice for the same reasons that Type systems are nice, and they're also a big win for concurrency and for encapsulation. Effects, as I understand them, allow specifying how code reads and writes state.

If you're wondering how to convince people to create even more code annotation, don't worry; we'll be able to infer effect information from the code itself, just as we can infer type information from the code itself.

Even with perfect inference, explicit annotation will be useful for external data and for specifying contracts.

Thursday, April 07, 2005

RDF and the WWW

Just had an epiphany: RDF is very similar to the world wide web.

They're both edge-labelled directed graphs.

Though people generally aren't so careful with their link texts, these are just like RDF predicates. The linking page is the subject, and the linked page is the object.

Tomboy and the Future

Tomboy is an excellent program for linux.

It looks like a simple implementation of post-it notes, but it has a few features which make it wonderful:
  • automatic save
  • ability to easily link any text to a new note (sort of like a wiki)
  • fast full search
  • clickable web links
There are more features that'd yield incredible power:
  1. "who links to me?" button
  2. ability to list nodes that are linked to by all of a set of nodes
  3. ability to open a node by path, with autocompletion
  4. "publish" button, for a single node or for a group of nodes
  5. ability to manage the graph graphically
#2 would enable the now fashionable folksonomy functionality.
#3 would replace bookmarks.
#4 would implement a blog authoring tool.
#5 would give you a mindmap.

This stuff shouldn't be in a particular application, but should be available to the whole system. It'll be great if this notes application demonstrates of the power of the approach, and motivates people to implement it more widely.

Monday, April 04, 2005

Corporate Personhood

Here's my attempt to distill the issue of corporate personhood. The best book on the subject that I've seen is The Corporation : The Pathological Pursuit of Profit and Power, by Joel Bakan.

Corporations are disproportionately powerful, because they concentrate huge amounts of money and therefore political influence. Is this good or bad? On the one hand, "what's good for GE is good for America." On the other hand, corporations by law must pursue no objective over maximizing shareholder value. This means externalizing costs whenever possible. For example, a corporation sells 100,000,000 widgets whose use will kill 100 people. That corporation may not legally settle for a smaller profit in order to make its widgets safer, and thereby save the 100 people. (The "what's good for GE" proponents may counter that the public sector makes such cold-hearted calculations all the time, and that the economic activity of the widgets is part of the "rising tide that lifts all boats," making the whole society wealthier and safer.)

What does this have to do with "corporate personhood?" Corporations concentrate huge amounts of capital, but they defuse almost all responsibility. When a corporation makes a million dollars, that money is shared by all its owners. When a corporation kills a person, that guilt is claimed by no one.

Though a corporation is fundamentally no more than a financial partnership, the existence of the corporate entity insulates its officers from morality and from the law. A bad action could spring from the combined actions of many members of a corporation, though none of the individual actions is bad. Even when a single member performs a bad action on behalf of the corporation, it's difficult to detect and prosecute. This insulation itself encourages bad behavior, because members' responsibility to the corporation is more pressing than their responsibility to the society and its laws.

Basing the rights of the corporation on the rights of the person gives corporations additional power. The law gives individuals various rights to protect them from the power of the state (e.g. the Bill of Rights). Sharing these rights with corporations gives them even more power. Treating people and corporations the same allows using a rhetoric of "freedom" to weaken and eliminate regulation, when regulation is the best tool for making corporations consider important externalities, and the only way to correct market failures.

Friday, April 01, 2005

Emulating maxdepth for Solaris find

Say you want to find files in the top level of a directory that meet a certain condition (e.g. haven't been accessed for at least three days).

GNU find can do this with "-maxdepth 1", but Solaris find can't. To get the same functionality for Solaris find, you can use this hack:
find /some/path/. \( -type d -a \! -name . -prune \) -o -type f -print

That is, first prune all directories that aren't named ".", then print the names of the remaining files. It's important that the path be "/some/path/." instead of "/some/path", because that's what enables special treatment for the top level directory.

Thursday, March 31, 2005

Partial Replacing Lambda?

Wonder if partial functions are going to replace lambda in python3000.

partial(operator.add, 1) isn't such an attractive substitute for lambda x: x+1, but that's more about operator than it is about partial().

Maybe partial()+1 should be supported using operator overloading. It'd be especially nice to be able to do partial()+x+y instead of partial(operator.add, x+y)

This form should probably be a differently named built-in, but built-ins aren't cheap, and the no argument constructor isn't otherwise meangingful.

Wednesday, March 23, 2005

Humble and Smart

The more you learn about something, the more you realize how little you know about it.

This isn't just philosophy; when people know nothing about a thing, they assume that there's "nothing to it". Then they confidently make decisions based on incomplete information. (It could be because interesting elements exploit the interplay of the simple and the complex. Or it could just be because small details aren't visible at a distance.)

Paradoxically, bright people are more susceptible to this thinking. Bright people are accustomed to quickly understanding ideas outside of their field of expertise, and this understanding is often superior to experts'. A bright person can start to believe (subconsciously) that much of the work in a field is characterized by lack of understanding.

This has ramifications for religious leadership. That religious leaders have insight into issues not technically within the sphere of religious practice is a doctrine of Judaism, and probably other religions as well. It's only logical; following Judaism is about living wisely, and someone with a deeper understanding of Judaism should have attained greater wisdom. In order to be truly wise, however, a person must be aware of his limitations. Maybe this awareness can be achieved by encountering sufficiently dramatic examples (for example the abuse cases involving religious leaders in recent years), or maybe it requires learning a little bit about alot of different things.

(It is troubling to think that G-d confers special wisdom in a way that is completely independent of the natural working of the world. For one thing, thinking that you've got supernatural aid from G-d certainly exacerbates the problem described above. For another, there is no good way to tell when you got it, and when you don't. "Trust in God, but steer away from the rocks", applies here as well.

At any rate, I don't think Judaism encourages completely ignoring the rules of the physical world. There's a gemara in niddah that gives advice about achieving various goals; in each case the gemara advises both acting according to natural way of the world, and also praying for success. In this era characterized by hester panim ("G-d hiding His face"), we certainly shouldn't abdicate the responsibility to do our hishtadlus (effort), especially where other people are involved.)

Friday, March 18, 2005

Idiomatic Python

Programming languages, like human languages, each have their own idiomatic usage. Though there are many different ways to express a single thing, some patterns of expression just feel much more natural.

The idiomatic style of python is beautiful.

Here's an example. You'd like to allow categorizing emails using their subjects, by prefixing the normal subject with the category and a colon. So for a message with subject "game on tuesday", you could use subject "sports: game on tuesday". If no category is specified, you'd like to use "misc". Here's the Pythonic way to do it:
try:
category, subject = headers['Subject'].split(':')
except:
category, subject = 'misc', headers['Subject']

There are a few different features at work here:
  • "It's easier to ask forgiveness than permission"
  • Exceptions are used heavily
  • Language support for collections is simple and easy

Friday, March 11, 2005

"Startups" and Small Businesses

At least since the dot-com bubble, the idea of the startup has been enshrouded in glamor. Most recently Paul Graham posted How to Start a Startup, but his advice is for dot-com-bubble type startups. He suggests that starting a company means compressing a career of 40 years into 4, and assumes that every new business will either flop or become an industry giant.

It could be that when Graham says "startup", he means a company that would rather flop than settle for staying small. Only talking about these startups is strange for a few reasons. Many companies consider themselves successful without making a public stock offering, or being bought out for piles of cash. How about creating a successful small company first, on the way to becoming an industry giant later? And which ideas and strategies are appropriate for which goal? These questions are a lot more relevant today than how to handle a Venture Capitalist imposed CEO.

After all, though more people are employed by big companies, there are many more small companies than big ones.

Wednesday, March 09, 2005

Note to Future Self

Whenever I put something down in an unusual place, there's a strong chance I will "lose" it.

Hopefully when I'm older I'll look back at this note and realize that I'm not going senile.

Tuesday, March 08, 2005

Ipod Shuffle for Lectures

The IPod Shuffle is a wonderful device. One more feature would make it perfect for listening to lecture series (or books on tape or language courses), without bloating its wonderfully simple interface.

The IPod Shuffle should remember the current "song" after it's turned off.

So apparently this is supported for purchases from ITunes and Audible. In order to get it to work with anything else, you have to use faac to create .m4b files.

UPDATE: It works great. If you're using linux, get gnupod, mpg321, and faac. Then you can do something like:
mkdir /mnt/ipod 2>/dev/null
mount /dev/sda1 /mnt/ipod
for mp3 in *.mp3
do
m4b=${mp3%.*}.m4b
mpg321 -w $mp3 | faac -b 10 -c 3500 - -o $m4b
gnupod_addsong.pl -m /mnt/ipod $m4b
done
mktunes.pl -m /mnt/ipod
umount /mnt/ipod
Incidentally, it would be great if gnupod would expose a filesystem-type api to the ipod using fuse.

UPDATE#2: Don't use gnupod; use the iPod shuffle Database Builder.

Thursday, March 03, 2005

Warn of Unsaved Changes Javascript

OK, this will get all the javascript out of my system. This one is intensely useful for web "applications". It warns the user if she tries to leave the form with unsubmitted changes, whether leaving by moving to a different page or by closing the window. It requires a recent firefox or ie browser.
<body onLoad="lookForChanges()" onBeforeUnload="return warnOfUnsavedChanges()">
<form>
<select name=a multiple>
<option value=1>1
<option value=2>2
<option value=3>3
</select>
<input name=b value=123>
<input type=submit>
</form>

<script>
var changed = 0;
function recordChange() {
changed = 1;
}
function recordChangeIfChangeKey(myevent) {
if (myevent.which && !myevent.ctrlKey && !myevent.ctrlKey)
recordChange(myevent);
}
function ignoreChange() {
changed = 0;
}
function lookForChanges() {
var origfunc;
for (i = 0; i < document.forms.length; i++) {
for (j = 0; j < document.forms[i].elements.length; j++) {
var formField=document.forms[i].elements[j];
var formFieldType=formField.type.toLowerCase();
if (formFieldType == 'checkbox' || formFieldType == 'radio') {
addHandler(formField, 'click', recordChange);
} else if (formFieldType == 'text' || formFieldType == 'textarea') {
if (formField.attachEvent) {
addHandler(formField, 'keypress', recordChange);
} else {
addHandler(formField, 'keypress', recordChangeIfChangeKey);
}
} else if (formFieldType == 'select-multiple' || formFieldType == 'select-one') {
addHandler(formField, 'change', recordChange);
}
}
addHandler(document.forms[i], 'submit', ignoreChange);
}
}
function warnOfUnsavedChanges() {
if (changed) {
if ("event" in window) //ie
event.returnValue = 'You have unsaved changes on this page, which will be discarded if you leave now. Click "Cancel" in order to save them first.';
else //netscape
return false;
}
}
function addHandler(target, eventName, handler) {
if (target.attachEvent) {
target.attachEvent('on'+eventName, handler);
} else {
target.addEventListener(eventName, handler, false);
}
}
</script>

Wednesday, March 02, 2005

Telephone Shell Javascript Widget

Here's some javascript I wrote to coerce uniformly styled phone numbers. If you don't use any punctuation, it defaults to american style. You can enter unrestricted text after the number. You should be able to copy the following into an html file to try it out.
<form>
phone #<input onkeypress="return phone_only(event)">
</form>

<script>
function phone_only(myevent) {
mykey = myevent.keyCode || myevent.which; //ie||netscape
myfield = myevent.srcElement || myevent.target; //ie||netscape
if (mykey == 8) //backspace (netscape only)
return true;
f = myfield.value;
g = myfield.value;
ndigits = f.replace(/-/,'').length;
ngroupdigits = g.replace(/.*-/,'').length;
if (ndigits == 0) {
if (50 <= mykey && mykey <= 57) { //2-9, can't start with 0 or 1
return true;
} else {
return false;
}
} else if (ndigits <= 7) { //only need 2 hyphens: 123-456-7
if (32 <= mykey && mykey <= 47 && ngroupdigits != 0) { //punctuation
myfield.value += "-";
return false;
} else if (48 <= mykey && mykey <= 57) { //0-9
if ((ngroupdigits % 4) == 3) {
myfield.value += "-";
}
return true;
} else {
return false;
}
} else {
return true;
}
}
</script>

Tuesday, March 01, 2005

Forms Support in Javascript, not Template

Turns out it's pretty simple to populate forms generically in javascript. This means that your template engine doesn't need special html forms support; you can just use a pair of templating "for" loops to create the javascript data structure. You could even grab the data using xmlhttprequest. Html made the the mess, so html can clean it up.

Here's an example form, followed by the code. You should be able to copy into a file and try it.
<form>
<input name=a>
<input name=b type=checkbox value=x1>
<input name=b type=checkbox value=y1>
<input name=b type=checkbox value=z1>
<select name=c multiple>
<option value=x2>X2
<option value=y2>Y2
<option value=z2>Z2
</select>
<button onClick="populateForm(this.form, {'a':'x0',b:['x1','z1'],'c':['x2','z2']}); return false">Populate</button>
<input type=reset>
</form>

<script>
function populateForm(myForm,myHash) {
for (var k in myForm) {
if (!(k in myForm) || !(k in myHash)) continue;
if (typeof myHash[k] == 'string')
myHash[k] = [myHash[k]];
if (myForm[k].type == 'text') {
myForm[k].value = myHash[k][0];
} else {
var field = 'type' in myForm[k] && myForm[k].type.match('^select') ? 'selected' : 'checked';
var selected = Array();
for (var i=0; i<myHash[k].length; i++)
selected[myHash[k][i]] = 1;
for (var i=0; i<myForm[k].length; i++)
myForm[k][i][field] = myForm[k][i].value in selected;
}
}
}
</script>

Tuesday, February 22, 2005

Even More On Presentation Done Right

I changed my mind back; StringTemplate isn't the way to go.

Templates are neither necessary nor sufficient for separating model and view (presentation). Not necessary because the separation is really accomplished by the model/view protocol, and not sufficient because presentation code can always seep into the model code. In fact, the StringTemplate approach requires complex views to be implemented in the model code.

A great m/v separation may be achieved by just having separate code for model and for view, even in the same language. The only requirements are on the data passed from the model to the view:
  1. it should be available before the view code runs
  2. it should consist only of lists of lists, and lists of opaque objects
Flagrant violations are possible (like the view code writing to a database), but should be easy to spot. The view code isn't artificially limited; it can implement fancy animations or fractal decorations, for example.

It's probably worth noting explicitly that the view depends on the model and not vice versa.

Recent javascript interfaces from google seem to contradict requirement number one, but really they just use a lazy loaded model. The view treats the model as if it's present, but because of the model's prohibitive size, each piece is loaded only just before it's needed.

The StringTemplates author acknowledges that his solution is insufficient to keep view logic out of the model code. The model code could pass data like "the color red" to the view layer. Since StringTemplates is such a simple language, more complex view features must be implemented with the help of model code. Any possible document may be generated, but only with the help of the model code.

So if m/v separation isn't the point of templates, what is? Separation of roles. The roles of programmer and designer aren't necessarily performed by different people, but they do require different approaches. The designer role includes everything that can be done in a WYSIWYG editor, with all visible elements, "includes", and conditionals. As document formats evolve, this could become part of the normative editor functionality. CSS visibility could be hacked to support conditionals, and XInclude takes care of the rest.

Sunday, February 20, 2005

Google Maps and Free Source

If you haven't already, check out Google Maps; it's extremely cool. Besides being a generally superior implementation of maps, it loads new maps without reloading the page in your browser, just like gmail. After you view a map, its pieces are cached by your browser. This means they're fast, and can be viewed even without an internet connection.

A yet un-remarked-upon feature of these "network applications" is how similar they are to free source. Though the intellectual property restrictions of google and of free source are just as strict as those of the most proprietary software vendor, the implementations of both google maps and GPL'ed code are available for anyone to poke at. GPL developers have so much faith in their model that they don't worry about commercial competitors, and google developers apparently have so much confidence in their own skills that they don't worry.

You already have free access to both the google maps client software and the map data. Without too much difficulty, you could extend either. For example, you could use the google client to view your own map data, you could run your own software on the map data (which is already cached on your pc), and people are already hacking new features onto the whole google maps solution.

The major change in these new applications is that the interface is exposed. With regular google search, you either have to use the Google Web APIs or else screenscrape the html pages. With a little reverse engineering of the new applications, you can access the same interface that's used as part of their normal functioning.

(The Google Web APIs limit developers to 1000 queries a day, and presumably a higher volume of screenscraping or access to the new APIs would get shut down by Google.)

The Internet Changing Everything

  • Traditional categories of media become irrelevant

    Does a given piece of information belong in a book, an academic paper, a weekly magazine article, an editorial, a letter to the editor? The internet makes the distinctions increasingly irrelevant.

  • The description of an idea may be canonicalized

    A web page is an expression of an idea that is universally available (at least theoretically). It's easy to refer to, and easy to maintain. The internet will one day provide a distributed wikipedia, in which every idea has a URI and page(s) containing all public human knowledge of that idea.

  • The context of an idea may be completely specified
  • Traditional bibliographies are klunky in comparison. It will be possible to link a given piece of information with every other piece of information on which it relies. That metadata will itself be data, subject to easy analysis.

Apologies for early outline version.

Thursday, February 17, 2005

URL Features

Computers should record all operations on URLs. For example, I should have a list of URLs that I've emailed (along with the recipient and date), that I've instant-messaged, or printed, or transferred, or mentioned in any document.

"URL" means both local and remote. I should have a full history of every file I've opened (explicitly), even if it wasn't using the browser. Another operation that I want recorded is when I change files, and when I post forms.

I should get a smarter handling of "cached" and downloaded files. Browsers already have caches, so why I should have to manually make copies of files for speed or reliable access?

Thursday, February 10, 2005

Perl Problems

1 Obscure Features

The core language has so many features that even the greatest perl wizard does not know them all. Find a wizard and ask him how many of the following features from the core language he understands (without consulting the manual).

  1.  split(' ') vs split(/ /) vs split('') vs split(//)  
  2.  $x{'a', 'b'} vs @x{'a', 'b'}
  3.  $[ = 3
  4.  [{foo => 1, bar => 2}, "FOO", "BAR"]

And here's a realistic example of idiomatic perl:

    sub with_gs_acct_check_digit{"$_[0]-".substr(10-(map$_[2]+=$_,map/\d/?$_[1]++%2?split'',$_*2:$_:0,split'',$_[0])[-1]%10,-1)}

Since these features usually address a real need, they can be very tempting to use. Even if someone were to do the work to specify a sane perl subset, there is no tool to enforce its use. Often a perl expert isn't aware that the language features that he uses are obscure or difficult for the nonexpert to understand.

Perl's fast feature accretion is a direct result of its core design philosophy, "There's More Than One Way To Do It".

2 Few Errors are Caught by the Compiler

Since every new feature is encoded using a fixed set of operators, very few errors actually cause a compiler error.

    print "Strings are equal\n" if "abba" == "baba";

2.1 Context Confusion

Every perl expression has several entirely different results depending on its "context", the code in which the expression occurs. The following example will trip up even experienced perl coders:

    sub myPrint($) {
my $arg = @_;
print "the first argument is $arg ?\n";
}

3 Unsafe Features

Many commonly used features are unsafe. The "use strict" directive only catches the most blatant.

3.1 $_

For example, one of perl's major conveniences is the $_ short-hand for referring to the current index of a loop. The following is great way to print upper-cased versions of some strings.

    @t = qw( ibm dell msft );
for (@t) {
print uc "$_\n"
}

However, if you want to instead use a different function in place of "uc", this code may fail in mysterious ways. Functions called in the body of the loop can alter what the $_ reference refers to, and neither the caller nor the callee may realize the danger.

3.2 Exceptions

Exception support is inconsistent and dangerous. When most function fail, they just return false and sometimes an error code. Other functions expect to be called in "eval blocks", and will otherwise cause the entire program to terminate (e.g. mkpath from File::Path)

3.3 Auto-vivification

Another example is auto-vivification, or the on-the-fly creation of datastructures. For the following data structure:

    use strict;
my %myhash1=( "a"=>1, "b"=>2 );
my %myhash2 =( "myhash1"=>\%myhash1, "c"=>3, "d"=>4 );

Perl will properly catch the error:

    $myhash3{c}++;            # %myhash3 is not defined

But will happily allow:

    $myhash2{myhash3}{c}++;   # a new hash will be created!

4 Data Structures

Nested data structures in perl are unwieldy. They require using references, and most people are confused by references.

Wednesday, February 09, 2005

Reusing Software

A big difference between software and physical engineering is that software changes greatly over the course of its life. The object oriented programming approach is supposed to make this change easier and more reliable. Instead of duplicating and modifying code, we'd like to extend existing code and make our changes in the extensions.

There are still a few impediments to reuse in popular platforms. The following is a small collection of research into these impediments.

There mustn't be conflicts between new symbols added to the base code and the extensions. It wouldn't be difficult for languages to support this better.

The non-private call graph of public and protected methods in the parent mustn't change.
There must be a way to better encapsulate objects.
There must be a way to encapsulate and compose concurrent access to state.