Web Forms: Death By a Thousand Textboxes

Jeff - surely you aren’t suggesting that we should all develop something as complex as google and its constatntly maintained database just to parse addresses?

Oh, and google FREQUENTLY misparses addresses and gives me totally wrong information. It works much better if you start layering in information like city first so it can interpret on context.

To give you an idea of how complex the problem is, calgary,ab breaks its city into 4 quadrants. NW,NE,SE,SW. Edmonton, Ab a mere 300 km away uses straight N,W,S,E. Google works because it has HUGE databases of street addresses that it can use its standard search engine against. The database is constantly maintained and the code is continually tweaked.

Totally appropriate for a search engine, but way more effort than a simple address box can justify.

All valid points. If you did attempt the single text box approach you should provide feedback to the user as to how you are interpretting the data behind the scenes - you could do this on page refresh or asynchronously. If it has made any incorrect assumptions the user can then make the adjustment and move on…

Not ideal for many users who will be forced to do this because of the dozens of inconsistencies across the globe… but a fun little project for someone.

Xepol, you’re focusing on the wrong things (user “stupidity”, parsing complexity, the requirement to parse at all). Go to the root causes that Jon Galloway outlined.

This is a thought question. We should be asking why we have this problem!

I have dedicated a good deal of time to parsing and converting address data, and as has been said before, if it isn’t only to be used verbatim, it is a bitch.

By far the hardest of the lot was parsing. People are very imaginative and diverse in how they use address fields, and even when consistent there is ambiguity. In my fields (property related) especially it is important to try to squeeze what people think their address is into a template.

We’ve recently had to face this problem and we had to split our address fields.

The biggest prolem is that when people have the choice, the leave important data out. Like postal coses or provinces, etc. This can cause a whole lot of headaches later.

Also you can’t compare google search addresses to filling in addresses on forms. Google will parse what it can and give you back the image. If it’s not right, the user can then change the search criteria. But you can’t do that with stored data when sending parcels for example.

As a Dutch user, I absolutely detest mutiple-editbox address forms. They get it wrong – always – unless I deal with a Dutch or Belgian web site. One big edit box is just more user friendly, because as this[1] page shows, addresses are essentially unparsable.

However, as noted, companies need more detailed information in their database, so (just a quick idea) why not let the user fill it in free-form, then let them use the mouse to click on their town name, own name, etc., as the application requires?

I also like the free-form - parse it - multiple editbox idea.

Maybe a seperate Passport™-like Address Markup Language™ is required, but before that has any decent browser support, we’re one and a half decade further.

[1] http://www.bitboost.com/ref/international-address-formats.html#Formats

"If it’s not right, the user can then change the search criteria. But you can’t do that with stored data when sending parcels for example."
Why not? Every company I know that uses only house number and postal code ask you to confirm that the details they interpreted are correct. You can check with the user that the data is correct.

Two comments:

Edge cases make programming hard.

No, programmers make edge cases hard by focusing on them instead of the other 99%.

At my point of view, software has to work in 100% situations and edge cases are just cases that has to work as ‘normal’ cases. So that split between normal and edge doesn’t exists in reallity. It’s just that you simplify the situation in what you say ‘the normal’ cases and get all the hard stuff in ‘the edge cases’.

I use to aproach the problem without doing differences. The algorithm has to work with all the data that can come.

===============================================

On the other side, let’s talk about money. What about if you tell to your customer: I will do a single textbox that can read all the data entered by the user. It’s just 1500$ (probably more).

As far as the user’s work doesn’t make my customer loose his customers I’m pretty sure that he will say: ‘forget it’.

Or what about if you say to the end user: you will be able to put all the data on a single text box but all you buy here will be 1% more expensive. What would like the end user?

I see people talking here as if coding was free …

What do you think?

I had a bad experience over the weekend. I was filling in a web form, and it had boxes for first name, surname, and then a big one for address as in your address label example.

So I filled it all in, with my full address in the (huge, multiline) address box, and then noticed the next three boxes - Town/City, State/County, Postcode… To say that this annoys me is an understatement… Granted, I should probably do a bit of read-ahead, but all I was doing was filling in an address…

Carl

legally changed their name so that it is now just one name.

So “firstname”=name, “lastname”=name. I get that a lot, there are a heap of places that send stuff to Ms Moz Moz for no readily apparent reason.

Having normalised one database, I can say users are quite imaginative. We got more than 20 spellingings of Adelaide, for instance (it’s not easy - try it). But really, we don’t care. Print it, post it, next question.

If you want demographic data about your users, ask for it.

Did you know that there are PROFESSIONAL FORMS DESIGNERS and FORMS ANALYSTS who actually earn their livings keeping amateur crap like this from happening??

Hire a pro and this wouldn’t happen.

a href="http://www.bfma.org"http://www.bfma.org/a

A (US) phone number is generally parsable without a lot of effort. As the original post suggests - just once for the developer vs 10000 times for users.

On the other hand, a (US) address parser is WAY more complex, requiring purchased (is monthly good enough?) data. A big expense. And why parse rather than just leaving a lump? Apart from the aforementioned sales analysis, the USPS will give you discounts for sorting your bulk mailings to those sales analyzed ZIP codes.

Did you know that in the US you can send a letter to:
Joe Schmoe
123
19101-1234
(123 is the house or apartment number)
So you can really save yourself some database space but good luck explaining that to your end user! :slight_smile:

MoZ: That’s a good idea. Doesn’t help with things like insurance where legality is involved. But that’s a different kettle of fish.

If this comments window wasn’t fixed width and didn’t give an error when I click preview and then click on my name, I would be easier to take your comments on board Jeff.

You like to play ping pong with other peoples comments, yet here are two problems which have been here as long as I can remember.

I know you’ll say it’s because of the software you use.
But I don’t care about that, I’m just a user being annoyed by a problem.

All you folks talking about autofill (not autocomplete, but the web browser autofill) are missing one factor.

The Autofill functionality was added because it is a pain to enter that data all the time. In all those boxes, in whatever format the developer decides on.

So the autofill functions were added to make life easier. And you still have to go through the hassle of getting autofill setup correctly (after a couple of submissions, it starts to remember).

I don’t have too much of a problem with the ‘1 textbox for the address’ idea (except as a developer I’d be glad for someone else to do the code for it!) but the single text box phone number entry? I beg to differ. To me, it’s far easier to type in COUNTRYCODE (TAB) AREACODE (TAB) NUMBER than it is for me to type in the number “(BRACKET)555(BRACKET)555(HYPHEN)5555”. Even allowing all formats (“5555555555”) can cause problems for me because as humans we remember phone numbers exactly like “COUNTRYCODE AREACODE NUMBER” - not as one long string of numbers.

First Post - EXCELLENT blog, by the way…

If only there was some way to store information, like security information, on the computer in a secure manner. One that the user had to initiate. Some kind of information store that would allow any application to set up a little footlocker inside the store. What would we call such a thing? A Keychain maybe? :wink:

a href="http://developer.apple.com/documentation/Security/Conceptual/Security_Overview/Security_Services/chapter_4_section_6.html"http://developer.apple.com/documentation/Security/Conceptual/Security_Overview/Security_Services/chapter_4_section_6.html/a

a href="http://developer.apple.com/documentation/Security/Conceptual/keychainServConcepts/index.html#//apple_ref/doc/uid/TP30000897"http://developer.apple.com/documentation/Security/Conceptual/keychainServConcepts/index.html#//apple_ref/doc/uid/TP30000897/a

Yes it’s a big step to impliment the parser the first few times. But it would eventualy become a common implimentation.

Allowing free-form phone/fax entry and full zip-code entry (with optional zip extension) is very easy. Here is some basic C# code to extract exactly this with very little error:

string FullPhone = Request[“PhoneTextBox”];
string PhoneDigits = new string();
foreach(char c in FullPhone)
{
if(Char.IsDigit©)
PhoneDigits.Append©;
}

// Remove optional “1” at the start
if(PhoneDigits[0] == ‘1’)
PhoneDigits = PhoneDigits.SubString(1);

string AreaCode = PhoneDigits.SubString(0,3);
string Phone = PhoneDigits.SubString(3,7);
string Extension;
if(PhoneDigits.Length 7)
Extension = PhoneDigits.SubString(7);

See how simple that was? Remove everything that isn’t a digit and an optional one and you have pretty good data. I’ve been using this method for years and its worked perfectly. Of course, this is just my memory of how the code works, not a copy/paste so sorry if there’s a HDD-Formatting bug in there somewhere.

This method allows users to input their phone number in any crazy format they like (wtf is with people that like to do “1.234.567.8910” or “234/567+4559 x998!” ?).

The same method can be used to parse US zip codes…the first 5 digits are the zip, the next 4 are the zip ext if they are there. If the length of the “digits-only” string isn’t 5 or 9, that’s an error.

Of course, our website is only for our customers, and we only contract inside the US, so this may not be that great for other countries. For all I know, there are random ASCII glyphs of musical notes and fractions in foreign zip codes :wink:

Address parsing is…uh…left as an exercise to the reader.

This same method works for

This has long been a pet peeve of mine too. I implemented a product last year that does data validation for ASP.NET web apps, and I made sure it didn’t propagate this broken pattern. The pre-fab validators for phone numbers, credit card numbers and the like will accept a variety of different formats from the user. The programmer gets a nice property that they can pull JUST the digits from without worrying about how the user typed it.