CoderFaireAtlanta 2013 RECAP

Whew, what a weekend! I’ve just got home from this thing called
“CoderFaire.”

What is a CoderFaire? Think of a bazaar for
technology and ideas – people, dusty from their travels, that come
together under one big tent to share in a common love for computers
and the wonderful things we make them do. Though we come from
different backgrounds, a common language is spoken – geek. Everyone
is accepted, everyone contributes and learns and leaves a little
richer in knowledge gained and friends made.

For some of us, this heaven on Earth. I cannot thank the organizers
(Cal, Kathy, Jacques, Ben, Chris, and everyone else) for their vision
and for making this event happen.

I am deeply grateful to FoxyCart for making my trip possible, and to
Cal and everyone for the opportunity to speak to such a bright and
eager crowd.

Here’s one small thing that sets CoderFaire apart from other
conferences – sponsors are ONLY allowed to send their developers. No
marketers, no BS. I spoke with the guys from GitHub (hey
Kevin!), Mashery (hey
Neil!), Twilio (hey
Keith!), and Basho (hey
Hector!), and am still inspired by
their commitment to the development community, their willingness to
pitch in and help, and their infectious good attitude. It’s one of a
thousand things that made this weekend such an amazing experience.

In the course of two days I learned:

  • How intuition fuels the development process, the importance of
    slowing down, and the importance of pursuing a career you
    love. Thanks Ben, for the incredible keynote (and the ace beer
    selections).

  • How MailChimp handles MySQL problems, courtesy of Joe. It was
    incredible validation to learn that the problems I’ve run into as
    we’ve grown at FoxyCart are typical MySQL issues, and to get a peek
    inside a business that handles 6000 signups PER DAY. Wow.

  • That people are REALLY interested in web security and the OWASP
    list! Not only that, but Atlanta has a monthly OWASP meetup (thanks
    Shauvik!). I had an absolute ball
    giving my talk and am humbled that my audience enjoyed it as well.

At some point during the talk schedule I stepped for air. Panting for
breath, I came to Cal and said, “ARGH, you’ve given me too many great
talks to choose from! How can I possibly see them all!” This is a
great problem to have at a conf. There were three rooms with
simultaneous talks on a wide variety of topics, which was a great way
to organize things given the mix of students, non-technicals and
professionals – everyone sees what they want, everyone has a blast.

Then I went back in for round two:

  • How showcase moved from a single dedicated
    server to a redundant multi-AZ Amazon Web Services solution,
    courtesy of Alan. Having worked ops
    for FoxyCart for the past few years it’s nice to see others having
    the same pains (and joys!) of moving to a completely virtualized
    architecture. Plan for failure.

  • How to build near-realtime multiplayer games with Ruby /
    CoffeeScript / HTML5, thanks to
    jweissman on GitHub. I love game
    programming, and I love the web, so this was a tasty treat combining
    the best of both. Joseph’s got huge ideas for this and I found his
    talk both interesting and inspiring.

  • All about Riak CS, thanks to
    Hector. If you don’t know, Riak CS
    is a “host your own” S3-compatible clustered storage system built on
    top of Riak. Thanks to some awesome hacking, Hector’s made it
    possible to fire up a full featured local cluster in a matter of
    minutes. I haven’t even begun to digest what this means for my
    development process – I can have my own private S3 that works
    offline with existing tools! Amazing.

  • Huge and interesting things about community leadership and
    development from Keith,
    Jacques, and the rest. I’m
    amazed and excited about the healthy growth of the Atlanta and
    Nashville tech communities. Thanks these leaders, technical
    communities are alive and well outside of the “Big City / West Coast
    / Silicon Valley” venue. It’s a great time to be a developer.

  • The ins and outs of running an international design firm courtesy of
    Maarten, who wins my “furthest
    traveled” award – he flew out from Amsterdam for the conference!

I’m so fired up right now that my typing can’t keep up with my
brain. Atlanta has an amazing tech community, and hanging out with
this crowd has given me great hope for
what we’re doing back home in little ol’
Augusta.

I’m sure I’ve forgotten details – a high concentration of awesomeness
can do that to a person. Can’t wait for CoderFaire ATL 2014!

If you missed my talk I put the slides and an OWASP top 10 cheatsheet Up on Lanyrd. You might also enjoy my sysadmin blog over at practicalops.com.

Posted in blag | Leave a comment

Phun with XML!

File this one under “the devil is in the details.”

While doing some Python hacking, I found this interesting article on a library called defusedxml which defends Python against some XML processing based attacks.

I was curious about how it might affect PHP (which, like most scripting languages uses libxml), and I found this StackOverflow thread on trying to avoid these same attacks.

In Brief

There are two related vulnerabilities, and both are pretty old (almost 10 years). One’s called “Billion Laughs” and the other is “quadratic blowup.”

The short of it is that a specially-crafted XML document can make a server consume a LOT of memory very quickly. This attack is courtesy of XML entities (like & in (X)HTML).  Part of XML is that you can create a bunch of random entities at the start of a document and the parser will try to do something sane with them.

Quadratic Blowup

If you make a document where `&a;` expands to `aaaaa…` 10,000 times then it’s pretty easy to see how it can eat up memory.

Billion Laughs

And if you make `&a;` expand to `&b;` which expands back to `&a;` you get a loop, and the parser just goes away and chews up CPU and memory till the cows come home. Nowadays parsers do recursion checks to avoid this, but certain flags can still make the problem happen.

How to fix it

Does your webapp receive and process XML? You’d better know how your language/framework handles XML and whether or not you can get hit by this.

It looks like the most common approach is to use the parser to remove the DOCTYPE directive at the start of the document. Pretty easy, as it shows up as a separate node in the DOM. Most parsers will protect against the “billion laughs” attack, but it’s probably worth writing a test for that too.

The root of the problem: XML isn’t strictly a data format

XML is also a document structure specification language (that’s what the DOCTYPE block is for). It’s also a way to mix different types documents together (namespaces and DTDs). Depending on what you build, you might need some of these features to do your job.

That said… if you don’t need those features, turn them off, or use a library that does so automatically (like defusedxml).

Your Rails friends might seem a little grouchy lately — they found out the hard way that YAML is not strictly a data format. It’s also a *serialization*) format. Without safeguards, the YAML parser will happily create new objects, any object you like! How about creating some objects that run shell commands on the remote server? Too easy. It’s as if someone printed in big letters, “what do you want to hack today?” (See the Kalzumeus article I link to at the end of this if you don’t think this is a big deal).

Et tu, JSON?

Actually, JSON is pretty safe. The main danger is receiving a huge and deeply nested document, but most JSON implementations are efficient enough to handle those. It might make sense to put a size limit on the JSON document accepted (are ya really gonna need to accept JSON docs bigger than a few hundred kilobytes?)

JSON is strictly a data format. It doesn’t support self-references, and there’s only one type of document — a JSON object containing strings, arrays, and other objects. JSON makes no effort to link its objects to actual program objects. If you want to do that, you’re on your own, but at least you’ll have fine-grained control over where you send the the user input.

Further Reading

There are more attacks on XML: XML Denial of Service Attacks and Defenses

What the Rails Security Issue Means for Your Startup

Even if you’re not a Rails shop, understanding the YAML-parsing vulnerability works should be a part of your security efforts.

Update (21 Feb 2013):

My colleague reminds me that JSON is not entirely safe, at least where there’s a browser involved.  Forgot about this one!

http://haacked.com/archive/2009/06/25/json-hijacking.aspx

Posted in Uncategorized | Leave a comment

Downtime Antipatterns for SaaS owners, ZipCar edition

Saturday afternoon (in the US), ZipCar’s website and phones went down for a couple hours.  For people who were, y’know, relying on them for transportation, this was a very bad time.  On top of that, they said very little during the downtime, leaving a bunch of customers wondering “WTF IS HAPPENING OVER THERE?!”

So, without further ado, let me present:

Downtime Anti-Patterns, courtesy of ZipCar

1.) Make a service that people rely on every day for something fundamental, like getting from point A to point B.

2.) Have your web site’s DNS entry on a 5 minute TTL, but don’t do any kind of DNS failover.

3.) Put your VOIP phone system in the same datacenter as your web servers.  That way, when there’s a DC-wide outage, no one can get in touch!

4.) Market to a hip, connected, social-network-using market and then, when the shit hits the fan, don’t update them for an hour or so.  They’ll understand.

 

5.) Don’t have a public “status” page.  People don’t need to know what’s going on with your service.

(I strongly disagree with the conclusion that anyone should be fired.  But look how upset people are, and rightly so—they have no idea what’s going on!)

 

6.) Apologize publicly but don’t make a public postmortem explaining why this happened and what steps have been taken to fix the problem.

 

What Should Have Happened

First off, let me say that I respect the ZipCar IT team, and any team that has to put out server-related fires.  It’s a tough job that can mean seriously long hours, lost weekends, dropping out of a family gathering to answer an urgent page, and more.  Keeping information systems running and available to the world 24/7/365.25 is NEVER an easy feat.

That said, here’s what I would have done differently:

1.) Give the IT team access to the company’s social network accounts, give them tools to post status updates quickly, and teach them to update customers as soon as a problem’s detected.

2.) Use an automated system to point DNS entries to a “sorry, we’re down, please see http://status.zipcar.com” page running on a commodity VPS in a completely different datacenter.  Provide useful information to the customer RIGHT AWAY, and don’t leave them wondering why the page isn’t loading.

3.) Have a status.zipcar.com already built.

4.) Do a PUBLIC postmortem analysis on the problem — What happened?  How did we hear about it?  Why didn’t we know sooner?  What surprised us?  What are we doing to make sure this doesn’t happen?

Look at those 4 points — I’m not criticizing the technical ability of the team and their ability to build redundant systems.  What’s in view here is their public reply, or rather, the lack thereof, which obviously and publicly left a number of their customers upset.

I understand that ZipCar’s a large company, that they have media and press people, social network consultants, policies and standards for all public communication.  Are they really so large that one person can’t whip out their iPhone and post a quick tweet within 5 minutes of a system fault?

 

Here’s my maxim:

Customers can forgive and forget downtime, but communication misfires will always be remembered.

Hey, remember when Netflix was going to stop sending out DVDs and change their prices?  Yeah, that went over well.

 

But what does this mean for a small SaaS company?

Most of us don’t have as many customers as ZipCar.  Heck, we wouldn’t know what to DO with that many customers, except maybe swim in pools of money like Scrooge McDuck.  There’s an unstated flipside to antipattern #1: build a service where downtime doesn’t ruin a person’s day.

When ZipCar goes down, people can’t find or reserve cars.  Appointments get missed.  People get very upset.

The same happens for an e-commerce service: FAILING to take money does not make happy customers.

Point of sales systems?  Your client’s customers can’t buy things, which makes them all very unhappy.

Medical services?  Air traffic control?  People are gonna die.

But what about when Basecamp goes down?  Oh no, I can’t manage my project right now, oh well… better get back to work.  I can sort it out later.

Freckle?  Harvest?  Uh oh, I can’t track my hours.  I’ll just write them down on a piece of paper and enter them later.

When a service like that goes down, it’s at most an inconvenience.  That doesn’t exempt any of those companies from full and immediate communication when there is a problem.  However, no one’s day is  completely ruined by timesheet problems.

My point is this: if you’re a small company, think about what happens when your customer can’t get to your service.  Empathize, put yourselves in the customer’s suede loafers, and think — man, that site’s down, so now is my day totally ruined?

Just starting out?  DON’T make a service that’s going to need impeccable uptime.  Actively avoid it, and mercilessly kill features that take steps in that direction.

Recognize that you don’t have the resources to make it happen — people to work on servers, cash for redundant servers in multiple datacenters, etcetera, etcetera.  Uptime is expensive.

Remember the key lesson here: customers can forgive downtime.  Be honest and humble when you screw up, and don’t pretend to be a massive company.  In fact, given ZipCar’s example, do the complete opposite of what big companies do.  Learn from your mistakes, and share what you learned with your customers.  At the end of the day, they’re the people who make or break your business.

 

Posted in blag, sysadmin, thoughts | Tagged , , , | Comments Off

The site’s down! Quick! Tell the customers!

Today Zerigo, a major DNS provider, went offline for an extended vacation.

It’s going on 10 hours since they’ve gone down and their status page and Twitter haven’t been updated in over two hours.  Does this make me happy as a customer?  (NO, NO IT DOES NOT AT ALL — ed.)

Long-term success comes from making customer relationships the HIGHEST priority.  I’d even place it one peg higher than actually fixing the problem.  Of course the problem needs to be fixed, and right quick!  But unless the problem is truly a “one minute fix,” sending information to the customers should be the very first thing that an Ops team does!

“The site’s down, I don’t know why”

This is what’s running through your customer’s mind.  All it takes is ONE tweet, one update to the status page: “Hey, we know this is down!  We’re on it and will let you know what’s up ASAP.”  In the absence of that tiny scrap of information customers are left with… vague uncertainty, and the seeds of distrust.

Don’t let that distrust grow for even a second—tell the customer NOW what’s going on and fix it as soon as possible.  If it takes longer than 10–15 minutes to fix, update customers again so they know what’s happening.  Keep updating them until the problem’s completely fixed!

Communicate Consistently

This is something we try very hard to do at my current workplace.  We have businesses relying on us for their main revenue source, and we necessarily take downtime very seriously.

But we still let the customer know what’s happening—almost as soon as we know it ourselves!  This means working together as a team, planning in advance, and having tools and systems in place to make this kind of communication as easy as possible.

“Tell the customer first” is the opposite of usual Ops behavior—when downtime hits, updating the status page is the LAST thing that’s usually thought of.  FIRST, put out the fire, then update status and Twitter when the flames die back.  If you were quick enough, maybe no one noticed the downtime and you don’t have to update at all!

What’s the problem with this approach?  It’s utterly invisible to the customer.  The customer is indeed well served by a quick resolution to the problem.

But what if it’s not quick?  And what about the 100 people that DID notice during that “one minute” that aren’t going to email you demanding explanation?  How many will simply go elsewhere?

Empower your Ops

This is why Ops teams should have access to your company’s Twitter feed.  This is why you need a status page.  Posting to these things should be as low-friction as possible.  This is why you need a culture of sharing critical information with customers as soon as you’re made aware of it.

Customers can forgive downtime, even extended downtimes, so long as there’s a good explanation, a sincere apology, and steps taken to ensure that it won’t happen again.  Communication goes a long way toward keeping good will, and I’d rather err on the side of over-communication than leave people scratching their heads, gnashing their teeth, and pounding their keyboards in frustration.

Posted in blag, sysadmin | Tagged , , , , | Comments Off

MySQL: 20 questions every sysadmin faces

MySQL: it’s a great tool!  Starts off as a simple bucket you can throw data into, save it in between requests, and use SELECT to contort it into any view you need.  Hey, I’m using MySQL, my data problems are SOLVED.  Forever.

Forever?  You sure about that?  How do you take snapshots of your data?  Accidents happen, so it’s prudent to prevent against them.

Ah, so you found mysqldump.  Cool!  That’s great when your data set is a couple hundred megabytes.  Did you know that it’ll lock your database server solid while taking backups when your data set outgrows that tiny limit?  Ever tried to restore a mysqldump backup?  Yeah, it takes almost the same amount of time to restore as it does to back up.

So pretty quickly mysqldump becomes an antipattern.  Aha, mysqlhotcopy to the rescue!  Wait, it’s 2012 and you’re using MyISAM tables?  I guess that’s OK, if you don’t care about ACID guarantees, transaction isolation and rollback, row level locking, or crash recovery.

mylvmbackup is a great tool if your data is stored on an LVM volume.  Beyond that, it’s time to look at InnoDB backup tools (you’re using those, right?) or using replication and taking the backup on the slave.

Oh, replication!  I see you nodding your head.  That’s where the data on one server is copied to the other server instantly, right?  Welllll…. mostly.

Did you know that there’s no consistency guarantee between master and slave?  Yep, the slave just reads whenever it’s ready and does no deep analysis of the tables to make sure that what it’s replicated is completely consistent with the master.  It should be always be consistent, right?  Hmm… I’m not sure.

It’s time to learn about pt-table-checksum and find out how to compare your master and slave data to make sure nothing goofy is going on. Oh, and you can’t trust the “seconds behind master” counter on the slave, sorry.  Better set up pt-heartbeat to make sure that your replication is working as well as you think.

Ah, so you found out about read-splitting to reduce the load on the master!  Very good, it’s a great technique if you can use slightly out-of-date data in your application.  Your slaves are read only, right?  Yeah, MySQL won’t stop you from making inconsistent data sets on different servers.  Again, see pt-table-checksum, this needs to be checked at least weekly and for goodness sakes set ALL of your servers to start up read_only.

Ah, so now you want durability in the face of a master failure.  Cool!  It’s nice to have if you have a spare database server.  How’s your Linux-HA and virtual interface knowledge?  Know how to promote a slave to a master?  Considered using HAproxy?  How are you going to make sure that the slave’s read ALL of the master’s binary log before it takes over?  In fact, what happens when it CAN’T read the master’s log and you need to start it up?  How do you reconcile those “in flight” changes that haven’t been replicated?  Did you also check out DRBD? MySQL Multi-Master Manager (MMMM good)?

Semi-synchronous replication, you say?  Great, you’re using MySQL 5.5!  You’ve looked through the changelog and made sure all of your queries to be compatible with the new version of MySQL, right?  Done any testing on the upgrade to make sure that it’ll still perform well on your dataset?  (Hint: pt-upgrade)

Ah, now you want to scale up with MySQL Cluster!  Yep, MySQL (or Percona) cluster is a great thing to have, and you get the consistency guarantees that plain MySQL replication doesn’t make.  Make sure you’ve got at least 3 nodes and are prepared to fix problems as soon as one goes down.  You’re monitoring all of this, right?

Oh, and you’ve totally got data warehousing covered, right?

These questions and their answers are the life of a DBA and systems administrator.  There are a thousand questions that no one ever thinks of when they choose a technology or application platform.  Hell, most of those you don’t even know until you’re entrenched and 3 months down the road scaling things up and out.

None of this is to knock MySQL, it’s a great piece of software and it’s got plenty of big-iron traction (c.f. Facebook).  Heck, Oracle liked so much they went out and bought themselves a shiny new MySQL AB.

If you can’t tell, I’ve been entrenched in MySQL docs and blogs and tools for a week and my head’s about to burst. :-)

This is why, when I get a chance, I dink with tiny little databases like Redis, and SQLite.

Redis is so very, very simple it’s a beautiful thing to work with.  It’s not a good fit for every data set, but there are niches where it just flies.  Message middleware, for example — would you rather have one tiny Redis server or a whole Stomp/AMQ stack and attached gear?

Why do we make these things so complex?  (Computers are hard, let’s go shopping!)

Oh wait, now I remember.  Answering these questions are why I get paid the big bucks.

Posted in blag, sysadmin, thoughts, Uncategorized | Tagged , , , , | Comments Off

“What’s Fred Using?,” the “back from the ashes” edition.

 

Yesterday my Mac’s filesystem had a bad day.  Not just an, “I don’t want to get up this morning day,” kind of day but the kind of day that ends screaming on the roof, hostage in hand, while the police try to talk you down on a megaphone and the SWAT team creeps around from behind to administer the final solution.

This is the first time I’ve seen Disk Utility give up and say, “Sorry bro, I can’t fix this.  Back up what you can and run for your life.”  No matter, I make backups and most of my data is sync’d with remote services and systems.  Two hours later, I’ve got a freshly formatted Lion system and a small handful of irreplaceable backup files.

And I’m standing at a crossroads: restore from Time Machine, or start from scratch and copy what I need?  I decide to go the road less traveled and start with a blank slate.

It was actually liberating to realize how little I need outside of my ~/.vim and ~/.vimrc, and how easy it was to grab those pieces using GitHub & Mac Homebrew.  I accidentally lost my SSH config but, no matter, I took ye olde backup (that thing was dusty) and stripped out everything but the dozen-odd hosts that I care about connecting to.

 

Here’s what I installed, in approximate order:

• Factor (http://factorcode.org).  I can’t live without a Factor environment on hand.  I love Factor and have a few indispensible personal tools I’ve written with it.

• Dropbox — no explanation needed.

• 1Password — the keys to the kingdom.

• XCode Command-Line tools.  A few hundred megs vs. the full install, and I don’t do Mac or iOS app development.

• Mac Homebrew — cure for the bad old days of Fink/DarwinPorts.

• GitHub for Mac.  At first I thought it was gimmicky, but being able to click “Clone in Mac” from the web browser any project and have the repo on my hard drive seconds later is too awesome.

• Tapped the homebrew duplicate repository and compiled the latest Vim.  Why didn’t I use the vim that comes with Mac OS X?  No clipboard register, that’s why, and I like my vim with all the fixin’s.

• fish (via brew) — my favorite shell for home.  Universal variables, friendly colors, and intuitive command line completions make me happy.  Setting PATH to prefer the brew versions of utilities over the built in ones is as simple as `set -gx PATH /usr/local/bin $PATH; set -U PATH $PATH` — no config files to fuss with.

• Spotify — Gotta have music.

• NetNewsWire — Feed me.

• Cloud.app — instant screenshot uploads that put the link on my clipboard.  What’s not to like?

• Alfred (alfredapp.com) — Won my heart after LaunchBar.  Hotkeys for all of my most used apps.

• Propane (propaneapp.com) — such an awesome app.  Completely worth $20, I keep it open to chat with my team whenever we’re working.

• Chrome — which, with the profile associated with my work account, downloaded my bookmarks along with my favorite extensions: Vimperator, 1Password, and Block Yourself from Analytics.

• Sparrow — This was an interesting decision.  I’ve used Mail.app for a long time but have gotten sick of its bloat and slowness.  I decided not to restore my multi-gig mailboxes (it’s all on Gmail/Google Apps anyhow) and use a lightweight client.

 

The main feature keeping Mail.app in my toolbelt was S/MIME (encrypted) email support, but I found an acceptable replacement in the S/MIME support in iOS 5, for the few times I need to read and reply to a sensitive email.

 

Things that I love about Sparrow:

• Gmail keyboard shortcuts (hjkl!)

• Wicked responsive.

• Growl notifications

• Easy to quickly reply to email from friends or team members who need a timely response.

 

• TextExpander — I have some cool abbreviations that are now hard-wired to my fingers, so I gotta have it.  My useful abbreviations were in Dropbox so it took no time to get it going again.

• NValt — a version of Notational Velocity that has some cool tricks like Markdown formatting.  Discovered by accident when I went to reinstall NV.  So far, I love it.

• iTerm2 — it seems a little faster than Terminal, and it makes pretty colors with Solarized (link).

 

Things I got rid of:

• MacVim — nice, but since I’m always swapping between a terminal window and vim, why not just run vim in the terminal?  With the “all the fixin’s” version I built and iTerm I have mouse and clipboard support, so there’s reason to use it anymore.

• XCode / Developer stuff.  I need a gcc/clang toolchain, not a full iOS development environment WITH documentation AND examples.

• A lot of cruft that’s accumulated in the form of checked out projects, dotfiles, prefences, apps and brew packages that I don’t, preference panes that do various unauthorized things to the underlying parts of OS X.  I don’t even miss them.

• All told, 25G of crap that I wasn’t using.

I guess that it’s not such a minimal list after all, but the environment feels and lightweight and fast.  Maybe not awesome (awesome.naquadah.org) on Arch Linux fast, but fast.  With the SSD I upgraded this MacBook with last year Lion is as quick as ever.  But is it quick enough?  Of course not!

Out of curiosity I disabled swapping according to this tip (http://hints.macworld.com/article.php?story=201106020948369).  Since I’m not running Mail.app and have traded down in terms of memory-hungry apps, I figure my 3GB of RAM should be enough.

Reboot, and it’s like magical gnomes climbed inside my computer and made every chip 10x faster.  Holy wow.  All of the frustrating slowdowns from the past few months? GONE.  Any hint of slowness?  Banished by a passing breeze, a breeze that smells like awesome win.

You know, it’s all about responsiveness.  A 2GHz CPU and 3G of RAM doesn’t mean anything if the machine randomly hangs when I’m doing a task.  When I press keys and move the mouse the computer needs to DO something, NOW.  This is something that Apple has always cared about and it’s still one of the best things about my Mac.  It’s in Jef Raskin’s seminal work on user interface design, and it’s considered a primary feature of an interface.  If a tool doesn’t respond right away when I use it, then I hate the tool.

Right now?  No hate baby, nothin’ but love for my new/old rig. :-)

 

Posted in blag, Uncategorized | Tagged , | Leave a comment

“Remembering Steve,” or “20 years of the Happy Mac”

It started with something called an “Apple II” in a darkened room at my elementary school. Oregon Trail, Odell Lake, turtle graphics, writing our first stories and printing them out.

HyperCard, hours making flipbooks and puzzling through scripting on my salvaged Mac SE 30 (30 for 30 meg hard drive). Happy times spent with my friend who is also gone building new worlds, bending the machine to our will, fueling our imagination.

It’s what I used to write my first newsletter, print and distribute it to my 4th grade class. The first time I’d ever see a CD drive or play a multimedia game.

The first computer I seriously wanted: a Color Mac Classic. The first time I started saving my money to buy some new hardware. (Only $600!)

Continue reading

Posted in blag, thoughts, words | Tagged , , , | Comments Off

Copy and Copy JSON: essential Listener hacks.

Something I’ve really wanted in the Listener for a while is the ability to right-click and copy a string to the clipboard. I’ve been working around that by having my own “>clipboard” word that places a single string argument on the clipboard — extremely handy for debugging long HTTP responses which get truncated at the right-side of the window.

: >clipboard ( str  )
clipboard get set-clipboard-contents ;

This is from Factor’s useful ui.clipboards vocab, which is the most concise and least ugly way I’ve seen to handle clipboards in a cross-platform GUI. clipboard is a global reference to the current clipboard object whose methods are overridden to handle the dirty details of the current platform’s clipboard.

Along came this useful post about adding an ‘open url’ command for URLs in the listener, and I saw a path to my goal: ui.operations and the define-operation word.

Combined with unparse to get a string representation of an object, this adds a “Copy” command to the Listener’s context menu for all objects:

: copy ( obj  )
unparse >clipboard ;

[ drop t ] \ copy H{ } define-operation

In this brave new Web 2.0 world, it’s dangerous to go alone! Take this JSON with you*:

: copy-json ( obj  )
>json >clipboard ;

[ string? not ] \ copy-json H{ } define-operation

Load up the vocab (or paste those definitions in the Listener), and voilá — the GUI is instantly extended with new context menu options:

As usual, this is all available on GitHub

*(Sorry for the Zelda reference, just spent a happy weekend playing a big chunk of the awesome LD20 48-hour game compo entries)

Posted in code, factor | 3 Comments

Factoring the Luhn algorithm

I work a lot in the field of e-commerce, and have written at least two shopping carts. Anyone who has implemented any kind of payment processing probably knows about the Luhn algorithm, which is a simple test that one can apply to a credit card number to make sure that the customer entered it correctly.

Factor and, I believe, concatenative languages as a whole can be very expressive when it comes to describing and implementing an algorithm. Let’s explore this.

Meet the Luhn

First, let’s look at an informal explanation of the algorithm

The formula verifies a number against its included check digit, which is usually appended to a partial account number to generate the full account number. This account number must pass the following test:

  1. Counting from the check digit, which is the rightmost, and moving left, double the value of every second digit.
  2. Sum the digits of the products (eg, 10 => (1 + 0) => 1, 14 => (1 + 4) => 5) together with the undoubled digits from the original number.
  3. If the total modulo 10 is equal to 0 (if the total ends in zero) then the number is valid according to the Luhn formula; else it is not valid.

The first whack

: luhn-check ( number-seq  passes? )
reverse #! Moving from right to left,
double-every-other #! double every other digit.
sum-digits #! Add the digits of the sequence, then
multiple-of-ten? ; #! test that the result is evenly divisible by 10.

The Factor description of the algorithm looks like a close approximation to the English description of the same. Interestingly, if you enter that word definition into Factor’s listener (i.e. REPL), Factor will note that double-every-other, sum-digits, and multiple-of-ten? are unknown words, however, you can “defer” the definition of those words and Factor will accept the definition. Pause for a moment here: Factor accepts without question words that it can’t actually evaluate or run which lets me start with this simple top-level definition and “fill in the blanks” as I go.

Craft the parts.

First, we need a word to double every other value in a sequence. That is, given the sequence { 1 2 3 4 5 6 }, we should get the result { 1 4 3 8 5 12 }.

Well, Factor supports a number of Lisp-like sequence combinators so this should be simple:

: double-every-other ( seq  seq-doubled )
[ odd? [ 2 * ] when ] map-index ;

For those new to Factor, [ … ] or “quotation” is the equivalent of (lambda … …) or function (…) {…} in your language of choice; that is, it defines an anonymous function. (For those who care about such linguistic details, “scope” or lexical bindings are an optional feature in Factor, in the locals vocab). Since Factor is concatenative, parameter passing is implicit and we don’t have to name or count the quotation’s arguments — Factor will infer them at compile time and warn if the program doesn’t add up.

Whereas map calls the quotation with each element of a sequence, map-index passes both the element and its index into the quotation: perfect for modifying every other index with the odd? predicate.

It’s almost readable as English: “Given a sequence, produce a new sequence, as so: if an element is at an odd index in the original sequence then double that element in the new sequence.” Yes, it sounds a bit stilted when read aloud, but it concisely and accurately confers the idea to the listener.

A few quick tests in the listener shows that this word works as expected.

Next we need a word that sums the digits of a sequence. That is, for the sequence { 1, 11, 5 }, this word should produce the sum (1 + (1 + 1) + 5) = 8.

Well, the sum word in Factor adds the numbers in a sequence together, but we need to treat two-digit numbers specially. Using integer arithmetic, the way to sum the digits of a two-digit number is to divide the number by 10 (num / 10) and add the remainder (num mod 10). Handily, Factor provides a /mod (read “divmod”) word which does both in one step. Let’s have a go at sum-digits:

: sum-digits ( seq  sum )
[ 10 /mod + ] map sum ;

Nothing surprising there. “Produce a new sequence as so: for each element of the sequence, apply /mod word to get the quotient and remainder, then add them together, placing the result in the new sequence. Sum the numbers in the new sequence.” Again, stilted, but readable.

On to the last word: multiple-of-ten?. This is simple, test if a number is evenly divisible by 10 — that is, the remainder of dividing the number by 10 is 0, or more simply: num mod 10 == 0.

: multiple-of-ten? ( n  ? )
10 mod 0 = ;

Very small, and easy to visually inspect and test.

Now we’re cooking with Luhn!

Well, that’s actually all there is to write; we’ve implemented all the parts the Luhn algorithm. The astounding thing is that we started with a simple definition and implemented the entire algorithm without changing the original definition!

A thought experiment

Imagine, as a Python or C++ programmer, if someone handed you a definition of a function and said “this definition can’t be changed, now implement all of the functions it calls.” In order to meet their original syntactic definition, I’d be hard-pressed and restricted in the features that I could use to implement the solution. If they used classes and OO-style code, I’d be stuck with the hierarchy they implied or I’d need to start hacking around them with templating, macros, monkey patching, or other such self abuse. Even if they didn’t use those features of the language, I’d be bound by the function calls and parameter passing in the original definition, which might make my life very difficult as I wrangle the data flow to produce their intended result.

Whereas, in Factor or Forth, handing a well-crafted but fixed definition of a word to someone can lead to an elegant solution that not only looks correct but runs correctly.

Don’t let me be misunderstood

Truthfully, I re-factored the words in this example several times (see the my commit history) before coming to this tidy little definition. I think it has to be experienced to be understood — rearranging words and testing them in an interactive environment feels natural and simply right. To paraphrase Chuck Moore, author of the Forth language, (speaking of implementing a Bluetooth stack), “first you have to figure out what they’re actually doing [in the algorithm], then you simplify the definition and fill in the parts.”

There is some ineffable “ah hah!” moment that comes when writing a well-factored program which makes me smile inside, some intangible correctness of having a readable definition that not only looks right but is right.

Look at the examples and tell me that a single one of those is more clear than this:

: luhn-check ( number-seq  passes? )
reverse #! Moving from right to left,
double-every-other #! double every other digit.
sum-digits #! Add the digits of the sequence, then
multiple-of-ten? ; #! test that the result is evenly divisible by 10.

Granted, those aren’t the most shining examples of implementations (hell, I’ve written stuff equally confusing), that this is a trivial algorithm, and maybe I’m getting old, but nowadays all of those just look like esoteric nonsense. (Really, look at the Java and Python implementations; way too clever for their own good).

Now if only I could grow a decent beard to go with these suspenders…

The source to this post is available on GitHub

Posted in blag, code, factor | Tagged , , , | 3 Comments

This is your weblog on Dexy!

Well, this last weekend, instead of writing code or blog posts, I fiddled with Dexy. And I’ve come to one simple conclusion:

Dexy is freakin’ awesome!

Dexy is a documentation writer’s dream tool: it takes words about code, pictures, and original unchanged source code and generates absolutely gorgeous documentation. In fact, Dexy generated the markup for this entire post from just a few small files and automatically posted it to my WordPress blog. Now that is utility!

Here is a Factor script:

! Copyright © 2011 Your name.
! See http://factorcode.org/license.txt for BSD license.
USING: http.client json.reader kernel prettyprint strings multiline ;
IN: weblog.dexyfied

#! @export section-1
: test1. ( )
"http://oohembed.com/oohembed/?url=http%3A//www.flickr.com/photos/fuffer2005/2435339994/"
http-get nip >string json> . ;

See that funky @export comment in there? That lets Dexy pull out just a small section of that script:

: test1. (  )
"http://oohembed.com/oohembed/?url=http%3A//www.flickr.com/photos/fuffer2005/2435339994/"
http-get nip >string json> . ;

I even got simple interactive Factor sessions going:

( scratchpad ) USE: weblog.dexyfied
( scratchpad ) test1.
H{
{ "version" "1.0" }
{ "provider_name" "Flickr" }
{ "width" "300" }
{ "height" "300" }
{ "author_name" "fuffer" }
{ "title" "echo 2" }
{ "cache_age" 3600 }
{ "type" "photo" }
{
"url"
"http://farm4.static.flickr.com/3092/2435339994_4ab42c3c20.jpg"
}
{ "provider_url" "http://www.flickr.com/" }
{ "author_url" "http://www.flickr.com/photos/fuffer2005/" }
}
( scratchpad )

That’s the result of running this Factor program:

USE: weblog.dexyfied
test1.

It’s re-generated each time that the file changes: no bitrot, just exactly what happens when you type that code into the Factor listener, always in sync with the code on disk. This is sort of like Literate Programming but without doing weird things to the source code while retaining the ability to structure the documentation like, well, a document.

Still a bunch of bugs with the interactive handler, and the splitting of Factor “sections” is a hack, but I hope to improve these things as I go. Already, this beats the old “copy-and-paste” approach by a factor of approximately 10 jillion. Take a look at the source to this post.

Posted in code, factor | Tagged , , , , , , , | Leave a comment