FOSS Unleashed

Software I Use: xs

A couple of years ago, I stumbled upon a shell called xs, this shell is a descendant of a shell called es, which is itself a descendant of rc (plan 9’s shell). Unfortunately, the original author no longer wishes to maintain the project, so I maintain my own fork of it.

While I was quite experienced with writing bash scripts, I absolutely hated it, the syntax was excessively unique, and I found myself having to dive into the manual about 50% of the time I would write a script. bash‘s string and array handling are horrendiously clunky, and I was very much wishing for something that wasn’t so horrid. While xs isn’t perfect, it’s is massively simpler to write for than bash.

But that isn’t to say that xs is free of idiosyncracies. It certainly has a few. But let’s go through a quick tour of how to do basic scripting with xs.

a = 1 2 3 4 5

# push a value onto the end of the list `a`
a = $a 6

# unshift a value in front of the list `a`
a = 0 $a

# shift a value out of the list `a` into the list `v`
(v a) = $a

First thing to be aware of is that xs has lists. All variables are lists. Because of this, array-like access is actually really nice, and you can simply rely on syntax to handle three of the four common array mutations (append, prepend, pop, and shift). When assigning a list to multiple names one value will be assigned to each name, with the exception of the last name where the remainder will be assigned. This happens with function arguments as well.

fn demo {|one two rest|
    echo Here is a single value: $one
    echo Here is another single value: $two
    echo Here is the rest of the values $rest
}

demo 1 2 3 4 5

That produces the following output:

Here is a single value: 1
Here is another single value: 2
Here is the rest of the values 3 4 5

This is quite convenient, vastly improving upon writing functions for bash, where one has to write boilerplate simply to handle the function’s arguments if the function is at all remotely complex. But it’s also the first place I found one of xs‘s faults.

Contrary to what one would expect from most other languages, xs will flatten all lists. xs has zero capability to handle multidimentional lists, and if you were to call a function with multiple lists, xs will create a list of the arguments, flatten that list, then process the arguments. Meaning that unless you are passing lists by name, instead of by value, you cannot pass multiple lists to a function.

Shell detection: bash, rc or xs?

A while ago, I needed (for admittedly a fairly silly reason) to have a file that could be sourced by both bash and xs, but then it would source appropriate files for the shell. This means the file had to be both legal bash, and legal xs. At the time I was aware of rc (and es), so now we’re here to do the same with rc and xs:

# bash
test -n ''^'' && { echo bash; exit 0; }
# rc/xs
test -z ''^'' && { fn testx { ~ $1 asdf && echo rc || echo xs}; testx asdf }

This has a few caveats. Firstly the bash test must be first. An external test program must exist in the path. And the command’s arguments cannot have “asdf” as the first argument when you’re calling from an instance of xs. bash requires the trailing semi-colons, and it does need to exit before it attempts to parse the rest of the file.

How does this work?

First off, we filter for bash. We use the ''^'' string in both tests. bash will resolve this to a string of ^, which is a non-empty string, test -n tests for a non-empty string. However, both xs and rc will resolve the string as an empty string, since ^ is the concatenation operator. The concatenation of two empty strings is an empty string. Then we follow through and (ab)use the differences between how arguments are passed in both shells. rc does not support named arguments, and the values of the $* and $1 variables are set to the arguments of the function. xs supports named arguments, and in fact requires them, at all times the $* and $1 variables are set to the arguments of the script.

You might also note the lack of if commands. In all cases, there is no if command that is syntaxically valid in more than one of these shells. Instead we must rely on logical shortcutting via the && and || operators, thankfully all three shells accept the same syntax.

Password Generation with `dd`

Here’s a short one-liner to generate a 76 character password in bourne shells (IE: bash, zsh):

dd if=/dev/urandom count=1 bs=57 2>/dev/null | base64

In rc-based shells the same:

dd if=/dev/urandom count=1 bs=57 >[2]/dev/null | base64

On certain systems you might need to use /dev/random instead.

One thing to note that the block size of “57” seems a little paticular. Any larger and the output would spill over into another line as the base64 program will line-wrap its output. Additionally, if you wanted multiple passwords, you could provide a different value to count=. Or make a function:

# bourne shells
function randpw() {
    count="${1:-1}"
    dd if=/dev/urandom count=$count bs=57 2>/dev/null | base64
}

# xs
fn randpw {|count|
    if {~ $count ()} {
        count = 1
    }

    dd if=/dev/urandom count=^$count bs=57 >[2]/dev/null | base64
}

TODO: find out if plan9 actually has /dev/random.

How to generate an .ico file with plan9-port

Provided a 32x32 pixel png named test.png, we can produce a favicon.ico file with plan9-port.

9 png < test.png | 9 toico > favicon.ico

A gentle reminder that 9 is simply a script that wraps the plan9-port commands so that they do not pollute your normal $PATH. So the programs called are actually png and toico. The png program simply takes a .png through stdin, and outputs a plan9 image file to stdout. Then the toico program takes a plan9 image file from stdin, and outputs a .ico file to stdout.

However, that’s not how I produced my favicon.ico file. Instead, from plan9-port’s paint, I created a completely black image, and named it test.img. From there I ran the following:

9 crop -r 0 0 32 32 < test.img | 9 toico > favicon.ico

This has some similarities to the previous, but instead of converting to a plan9 image, we crop it with the crop program, pulling out a 32x32 square from the top left corner of the image. Since crop takes in a plan9 image, and outputs a plan9 image, we can just run toico as before.

Venti protocol information

Protocol Reference (duplated in the man-page dump)

The following is the protocol reference for the venti protocol. Note that numbers infront of each line are the actual value for the Vt* message. Thus VtTping is (2, $tag) and responded with VtRping (3, $tag). Assuming that tag is 0, I think this means the messages would have 00 00 00 06 02 00 and 00 00 00 06 03 00 as their hex values. Both packets include a 4-byte (vs a 2-byte) size field, which like 9p, the size field includes itself.

        4   VtThello tag[1] version[s] uid[s] strength[1] crypto[n] codec[n]
        5   VtRhello tag[1] sid[s] rcrypto[1] rcodec[1]

        2   VtTping tag[1]
        3   VtRping tag[1]

       12   VtTread tag[1] score[20] type[1] pad[1] count[2]
       13   VtRread tag[1] data[]

       14   VtTwrite tag[1] type[1] pad[3] data[]
       15   VtRwrite tag[1] score[20]

       16   VtTsync tag[1]
       17   VtRsync tag[1]

        1   VtRerror tag[1] error[s]

        6   VtTgoodbye tag[1]

        8   VtTauth0
        9   VtRauth0
       10   VtTauth1
       11   VtRauth1

Dump Production Command

9 man 7 venti | 9 grep -v -e 'Page [0-9]' -e '\)$' > ~/venti.man.txt

Manpage Dump

 NAME
      venti - archival storage server

 DESCRIPTION
      Venti is a block storage server intended for archival data.
      In a Venti server, the SHA1 hash of a block's contents acts
      as the block identifier for read and write operations.  This
      approach enforces a write-once policy, preventing accidental
      or malicious destruction of data.  In addition, duplicate
      copies of a block are coalesced, reducing the consumption of
      storage and simplifying the implementation of clients.

      This manual page documents the basic concepts of block
      storage using Venti as well as the Venti network protocol.

      Venti(1) documents some simple clients.  Vac(1), vacfs(4),
      and vbackup(8) are more complex clients.

      Venti(3) describes a C library interface for accessing Venti
      servers and manipulating Venti data structures.

      Venti(8) describes the programs used to run a Venti server.

    Scores
      The SHA1 hash that identifies a block is called its score.
      The score of the zero-length block is called the zero score.

      Scores may have an optional label: prefix, typically used to
      describe the format of the data.  For example, vac(1) uses a
      vac: prefix, while vbackup(8) uses prefixes corresponding to
      the file system types: ext2:, ffs:, and so on.

    Files and Directories
      Venti accepts blocks up to 56 kilobytes in size. By conven-
      tion, Venti clients use hash trees of blocks to represent
      arbitrary-size data files.  The data to be stored is split
      into fixed-size blocks and written to the server, producing
      a list of scores.  The resulting list of scores is split
      into fixed-size pointer blocks (using only an integral num-
      ber of scores per block) and written to the server, produc-
      ing a smaller list of scores.  The process continues, even-
      tually ending with the score for the hash tree's top-most
      block.  Each file stored this way is summarized by a VtEntry
      structure recording the top-most score, the depth of the
      tree, the data block size, and the pointer block size.  One
      or more VtEntry structures can be concatenated and stored as
      a special file called a directory.  In this manner, arbi-
      trary trees of files can be constructed and stored.

      Scores passed between programs conventionally refer to



      VtRoot blocks, which contain descriptive information as well
      as the score of a directory block containing a small number
      of directory entries.

      Conventionally, programs do not mix data and directory
      entries in the same file.  Instead, they keep two separate
      files, one with directory entries and one with metadata ref-
      erencing those entries by position.  Keeping this parallel
      representation is a minor annoyance but makes it possible
      for general programs like venti/copy (see venti(1)) to tra-
      verse the block tree without knowing the specific details of
      any particular program's data.

    Block Types
      To allow programs to traverse these structures without need-
      ing to understand their higher-level meanings, Venti tags
      each block with a type.  The types are:

          VtDataType     000  data
          VtDataType+1   001  scores of VtDataType blocks
          VtDataType+2   002  scores of VtDataType+1 blocks
          ...
          VtDirType      010  VtEntry structures
          VtDirType+1    011  scores of VtDirType blocks
          VtDirType+2    012  scores of VtDirType+1 blocks
          ...
          VtRootType     020  VtRoot structure

      The octal numbers listed are the type numbers used by the
      commands below.  (For historical reasons, the type numbers
      used on disk and on the wire are different from the above.
      They do not distinguish VtDataType+n blocks from VtDirType+n

    Zero Truncation
      To avoid storing the same short data blocks padded with dif-
      fering numbers of zeros, Venti clients working with fixed-
      size blocks conventionally `zero truncate' the blocks before
      writing them to the server.  For example, if a 1024-byte
      data block contains the 11-byte string `hello world' fol-
      lowed by 1013 zero bytes, a client would store only the 11-
      byte block.  When the client later read the block from the
      server, it would append zero bytes to the end as necessary
      to reach the expected size.

      When truncating pointer blocks (VtDataType+n and VtDirType+n
      blocks), trailing zero scores are removed instead of trail-
      ing zero bytes.

      Because of the truncation convention, any file consisting
      entirely of zero bytes, no matter what its length, will be
      represented by the zero score: the data blocks contain all
      zeros and are thus truncated to the empty block, and the
      pointer blocks contain all zero scores and are thus also
      truncated to the empty block, and so on up the hash tree.

    Network Protocol
      A Venti session begins when a client connects to the network
      address served by a Venti server; the conventional address
      is tcp!server!venti (the venti port is 17034).  Both client
      and server begin by sending a version string of the form
      venti-versions-comment\n.  The versions field is a list of
      acceptable versions separated by colons.  The protocol
      described here is version 02.  The client is responsible for
      choosing a common version and sending it in the VtThello
      message, described below.

      After the initial version exchange, the client transmits
      requests (T-messages) to the server, which subsequently
      returns replies (R-messages) to the client.  The combined
      act of transmitting (receiving) a request of a particular
      type, and receiving (transmitting) its reply is called a
      transaction of that type.

      Each message consists of a sequence of bytes.  Two-byte
      fields hold unsigned integers represented in big-endian
      order (most significant byte first).  Data items of variable
      lengths are represented by a one-byte field specifying a
      count, n, followed by n bytes of data.  Text strings are
      represented similarly, using a two-byte count with the text
      itself stored as a UTF-encoded sequence of Unicode charac-
      ters (see utf(7)).  Text strings are not NUL-terminated: n
      counts the bytes of UTF data, which include no final zero
      byte.  The NUL character is illegal in text strings in the
      Venti protocol.  The maximum string length in Venti is 1024
      bytes.

      Each Venti message begins with a two-byte size field speci-
      fying the length in bytes of the message, not including the
      length field itself.  The next byte is the message type, one
      of the constants in the enumeration in the include file
      <venti.h>.  The next byte is an identifying tag, used to
      match responses to requests.  The remaining bytes are param-
      eters of different sizes.  In the message descriptions, the
      number of bytes in a field is given in brackets after the
      field name.  The notation parameter[n] where n is not a con-
      stant represents a variable-length parameter: n[1] followed
      by n bytes of data forming the parameter.  The notation
      string[s] (using a literal s character) is shorthand for
      s[2] followed by s bytes of UTF-8 text.  The notation
      parameter[] where parameter is the last field in the message
      represents a variable-length field that comprises all
      remaining bytes in the message.

      All Venti RPC messages are prefixed with a field size[2]
      giving the length of the message that follows (not including
      the size field itself).  The message bodies are:

        4   VtThello tag[1] version[s] uid[s] strength[1] crypto[n] codec[n]
        5   VtRhello tag[1] sid[s] rcrypto[1] rcodec[1]

        2   VtTping tag[1]
        3   VtRping tag[1]

       12   VtTread tag[1] score[20] type[1] pad[1] count[2]
       13   VtRread tag[1] data[]

       14   VtTwrite tag[1] type[1] pad[3] data[]
       15   VtRwrite tag[1] score[20]

       16   VtTsync tag[1]
       17   VtRsync tag[1]

        1   VtRerror tag[1] error[s]

        6   VtTgoodbye tag[1]

        8   VtTauth0
        9   VtRauth0
       10   VtTauth1
       11   VtRauth1

      Each T-message has a one-byte tag field, chosen and used by
      the client to identify the message.  The server will echo
      the request's tag field in the reply.  Clients should
      arrange that no two outstanding messages have the same tag
      field so that responses can be distinguished.

      The type of an R-message will either be one greater than the
      type of the corresponding T-message or Rerror, indicating
      that the request failed.  In the latter case, the error
      field contains a string describing the reason for failure.

      Venti connections must begin with a hello transaction.  The
      VtThello message contains the protocol version that the
      client has chosen to use.  The fields strength, crypto, and
      codec could be used to add authentication, encryption, and
      compression to the Venti session but are currently ignored.
      The rcrypto, and rcodec fields in the VtRhello response are
      similarly ignored.  The uid and sid fields are intended to
      be the identity of the client and server but, given the lack
      of authentication, should be treated only as advisory.  The
      initial hello should be the only hello transaction during
      the session.

      The ping message has no effect and is used mainly for debug-
      ging.  Servers should respond immediately to pings.

      The read message requests a block with the given score and
      type.  Use vttodisktype and vtfromdisktype (see venti(3)) to
      to the type used on disk and in the protocol.  The count
      field specifies the maximum expected size of the block.  The
      data in the reply is the block's contents.

      The write message writes a new block of the given type with
      contents data to the server.  The response includes the
      score to use to read the block, which should be the SHA1
      hash of data.

      The Venti server may buffer written blocks in memory, wait-
      ing until after responding to the write message before writ-
      ing them to permanent storage.  The server will delay the
      response to a sync message until after all blocks in earlier
      write messages have been written to permanent storage.

      The goodbye message ends a session.  There is no VtRgoodbye:
      upon receiving the VtTgoodbye message, the server terminates
      up the connection.

      Version 04 of the Venti protocol is similar to version 02
      (described above) but has two changes to accomodates larger
      payloads.  First, it replaces the leading 2-byte packet size
      with a 4-byte size.  Second, the count in the VtTread packet
      may be either 2 or 4 bytes; the total packet length distin-
      guishes the two cases.

 SEE ALSO
      Sean Quinlan and Sean Dorward, ``Venti: a new approach to
      archival storage'', Usenix Conference on File and Storage
      Technologies , 2002.