FOSS Unleashed

How to get sed to pull out a block

For a good long while, sed has been a great mystery to me. After-all, it is stylized after the infamous ed editor. Another thing that has eluded me fairly often, is how I can, from a shell prompt, print out a block of code, and only that block of code?

1
2
3
4
5
6
7
8
9
10
11
; sed '/cfg_services/,/)/p;d' /etc/rc.conf
cfg_services+=(
eudev
'mount' 'sysctl'
'@lo.iface' '@ntp' '@scron'
alsa
@agetty-tty{2..6}
sshd
@rc.local runit
nginx
)

Great. We’re looking for ‘cfg_services’, then we’re looking for the next ‘)’, printing those, then deleting all other output. But what if we had nested blocks of code? Assuming one has sane indentation, one can rely on that:

1
2
3
4
5
6
7
8
9
int main(void) {
if (foo) {
printf("Hello, World!\n");
}

if (bar) {
printf("Goodbye, World!\n");
}
}

Given that code, we want to print the foo check:

1
2
3
4
; sed '/if .foo/,/\t}/p;d' test.c
if (foo) {
printf("Hello, World!\n");
}

If we want both blocks, we can simplify the first regex:

1
2
3
4
5
6
7
; sed '/if (/,/\t}/p;d' test.c
if (foo) {
printf("Hello, World!\n");
}
if (bar) {
printf("Goodbye, World!\n");
}

Though this might not be as useful, since we can’t quite operate on each block individually.

Now, what if we wanted to add something to the block with sed? How would we do that? sed has an -i flag that allows it to modify the file it was given, however, we have to operate on the file differently.

1
2
3
4
5
6
7
8
9
10
11
12
; sed -i '/if (foo/,/\t}/ { /}/ i \\t\tfoo();'\n'}' test.c
; cat test.c
int main(void) {
if (foo) {
printf("Hello, World!\n");
foo();
}

if (bar) {
printf("Goodbye, World!\n");
}
}

A few things to note here. First, if you’re wanting to test your arguments to sed like this, then omit the -i and let it print out the entire file, or pipe that into another sed that is only printing the block you’re wanting to modify. Second thing to note is that -i supresses sed’s behavior to output to stdout, it is printing back into the file it read from instead. Thus to see the results, we have to print the file out again with cat.

But how does it work? We’re using the same pattern matching as before, but instead of having the address affect the p command, we’re having affect a command block in sed, which in this example is: '{ /}/ i \\t\tfoo();'\n'}'. Here, we’re giving a second address to work on, but this address is only within the context of the block we found, we’re looking for the closing brace, and running the i command on it. The i and a commands tell sed to insert and append respectively. If we did not give these commands a second address, they would run on all of the lines found with the first address range. Both commands are terminated with a newline, which my shell natively supports, but if you are using bash you will need to use $'\n' instead of the raw \n. We also have to provide an extra backslash to escape as both a and i will consume one.

Nice and simple. Hope everyone has a good week!

PTY allocation issue: strace to the rescue

Quick little post today, was having a bit of a frustrating issue where my user account could not spawn certain PTY-allocating programs, but could spawn others. Ultimately I ended up using strace to try and figgure out where it was failing.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
; strace -f abduco -c x trash
...
[pid 26646] openat(AT_FDCWD, "/dev/ptmx", O_RDWR <unfinished ...>
[pid 26644] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=26645, si_uid=10000, si_status=0, si_utime=0, si_stime=0} ---
[pid 26646] <... openat resumed>) = -1 EACCES (Permission denied)
[pid 26644] read(3, <unfinished ...>
[pid 26646] write(4, "server-forkpty: Permission denie"..., 34) = 34
[pid 26644] <... read resumed>"server-forkpty: Permission denie"..., 255) = 34
[pid 26646] close(4 <unfinished ...>
[pid 26644] read(3, <unfinished ...>
[pid 26646] <... close resumed>) = 0
[pid 26644] <... read resumed>"", 221) = 0
[pid 26646] close(3 <unfinished ...>
[pid 26644] write(2, "server-forkpty: Permission denie"..., 34 <unfinished ...>
server-forkpty: Permission denied
[pid 26646] <... close resumed>) = 0
[pid 26644] <... write resumed>) = 34
[pid 26646] close(6 <unfinished ...>
[pid 26644] unlink("/home/R/.abduco/x@workstation4k" <unfinished ...>
[pid 26646] <... close resumed>) = 0
[pid 26646] exit_group(1) = ?
[pid 26644] <... unlink resumed>) = 0
[pid 26644] exit_group(1 <unfinished ...>
[pid 26646] +++ exited with 1 +++
<... exit_group resumed>) = ?
+++ exited with 1 +++

First thing was that abduco forks, and the child attempts the allocation, so strace‘s -f option to follow children was needed, the [pid 26646] lines are all output of syscalls the child makes. The issue is the attempt to open the /dev/ptmx file, so I check the permissions:

1
2
; ls -l /dev/ptmx 
crw--w---- 1 root tty 5, 2 Mar 16 18:07 /dev/ptmx

Okay… a user with the tty group can write to it, but root can read/write it? That might make sense? But you can see the program is trying to use the O_RDWR flag, it’s trying to open it read/write. So I check a different system, /dev/ptmx has read-write permissions across the board. A quick sudo chmod a+rw /dev/ptmx fixes the issue.

What’s concerning is why eudevd had those permissions in the first place. A restart might have fixed the issue, but I’d rather not needlessly restart a system.

Drawing with Plan 9 (Pt 0)

I found a neat little code repository the other day, and decided to play around with it. Now, I’m not totally unfamiliar with plan 9’s drawing routines, having grazed the manuals a few times. However those manual pages run into the problem I have with a number of plan 9’s manual pages, but that’s a post for another day. The manual pages are broadly split into three parts, draw(3), graphics(3), and window(3). Those three man pages contain the bull of the function documentation for the drawing system, and also the definitions for the three main graphical structures: Image, Display, and Screen.

Overall, at first glance this is all sane. Until you look at the function prototypes. The names are not that bad, but if you’re glancing through them looking for some kind of rectangle primitive, you’re not going to see one. Want to draw a rectangle or fill an Image or window? You want the draw() function. Okay, drawing rectangles is the implicit default. That’s sane. Let’s take a closer look: void draw(Image *dst, Rectangle r, Image *src, Image *mask, Point p); and we immediately have a problem. There are three Image pointers as parameters, and no color. This led me down a little rabbit-hole where I was trying to find out how to color an Image. Surely there’s some means to access the pixel data for an Image right? … right? Let’s take a look:

1
2
3
4
5
6
7
8
9
10
11
12
13
typedef
struct Image
{
Display *display; /* display holding data */
int id; /* id of system-held Image */
Rectangle r; /* rectangle in data area, local coords */
Rectangle clipr; /* clipping region */
ulong chan; /* pixel channel format descriptor */
int depth; /* number of bits per pixel */
int repl; /* flag: data replicates to tile clipr */
Screen *screen; /* 0 if not a window */
Image *next; /* next in list of windows */
} Image;

Okay, so it’s implicitly a part of a linked list, for some reason, and it has pointers to a Screen but only if there’s a window, and a Display which is noted as “display holding data”. Surely, that means that a Display holds the pixel data right?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
typedef
struct Display
{
...
void (*error)(Display*, char*);
...
Image *black;
Image *white;
Image *opaque;
Image *transparent;
Image *image;
Font *defaultfont;
Subfont *defaultsubfont;
...
};

Nope. It contains a bunch of Images. I guess that makes sense as a “Display”, but what is with those? We have white, black, opaque? What? Maybe looking at Screen will enlighten us.

1
2
3
4
5
6
7
8
typedef
struct Screen
{
Display *display; /* display holding data */
int id; /* id of system-held Screen */
Image *image; /* unused; for reference only */
Image *fill; /* color to paint behind windows */
} Screen;

Okay, that’s reasonable, but not actually what we needed, it would be weird if it was to be honest. Okay, I don’t have any leads, so I may as well start poking around. My objective is to get the gui-menu.c program to not OLED-flashbang me every time it runs. This is actually pretty simple to solve: draw(screen, screen->r, display->black, nil, ZP); Here we’re working with two globals, display which is a Display * and screen which is an Image *. The latter bit tripped me up for a bit when I needed to access the Screen * because I wanted to muck with its fill property. There are two things that I learned here. First, the fill property doesn’t seem to do anything, doesn’t color the background at all in any situation I could try it on. Secondly, the draw() call I made needs to be in the event loop. Not sure why, there might be a paint event I should be listening to, but at the moment I’m just calling it every loop. Thankfully the event loop does block itself when empty.

So! Mission complete right? Yes. But I would really like to be able to draw my own color, and maybe figgure out how to work with the pixel data somehow. Now, it’s not referenced often, but there is a man page for allocimage(3) which has the following prototype: Image *allocimage(Display *d, Rectangle r, ulong chan, int repl, int col). This allocates a new Image that is filled with one single color. Each of the arguments aren’t too difficult to come by either. I wanted something like display->black but my own color so: (display, display->black->r, display->black->chan, 1, 0x004040FF); Nice and simple, duplicate the properties from display->black (though I manually set repl to 1, which is also the value that black and white have from display), compile, and… flashbang. A pure white screen. Now, a first I thought I was incoding the color incorrectly, but that wasn’t the case. The situation is the chan parameter, this indicates the color channel. Natually the black and white images don’t need to store a very complicated color, so that value is 0x31 (not sure what macro that expands to), which is an apparently low color depth. Given I was providing a 32-bit RGBA value, I needed to give RGBA32 as the channel value, and it worked. I have a nice dark non-black window.

Overall, this was an interesting adventure, but definately not one that enamoured me to the man pages. I feel like I’ll want to be taking notes and referencing those instead.

scp considered harmful

For whatever reason scp does not have a means to copy a symlink as a symlink. This makes scp absolutely terrible if you want to backup or archive a directory structure that contains a symlink cycle. If the directory you are wanting to copy contains a .wine directory, it very likely contains a symlink cycle.

As I have been burned by this in the past, I often argue against the usage of scp altogether, instead suggesting alternate means of transfering files. Either use sftp (via sftp, lftp, or filezilla), or just use tar | tar. I find tar piped into another instance of tar to be quite useful, and quite expressive once you get used to it.

# Copy directories from HOST
ssh HOST tar c dir1 dir2 dir3 | tar x

# Copy directories to HOST
tar c dir1 dir2 dir3 | ssh HOST tar x

# Copy directories to a specific directory on HOST
tar c dir1 dir2 dir3 | ssh HOST tar x -C ~/target/directory/

You can also put pv inbetween the tar transfers to get a progress bar.

The main thing to note with this technique is that tar c (note the lack of -f) produces a tarball to stdout, and tar x consumes a tarball from stdin. On occasion I find it helpful to copy files like this without ssh, generally I want to preserve a specific directory structure in these instances. Or I simply want the benefit of pv to get a sense of the progress of a larger file transfer.

Now, some of you might ask “why not rsync?” Which is a fair question, if you can rely on rsync being present on both sides (as it is requires this), rsync is in itself a very expressive tool. But that’s not something I can rely on. If there’s ssh on a host, there is also very likely to be tar.

XS List Operations (shift, unshift, push, pop)

All variables in XS are lists, this means that working with any kind of data that can be represented as an array of simple types (IE: ints or strings), can be handled by XS in a very simple manner.

shift

One of the list operations someone might need is to shift an item out of the front of the list, we can use the “multiple assignment” feature of XS to perform this operation:

; list = 1 2 3 4 5 6 7 8
; (a list) = $list
; var a list
a = 1
list = 2 3 4 5 6 7 8

Note: the var command simply prints the name and value of variables.

unshift

Prepending items to an array is commonly known as unshifting, XS will flatten all lists, making this operation trivial:

; list = 0 1 $list
; var list
list = 0 1 2 3 4 5 6 7 8

push

Appending items to an array is exactly as simple as unshift is, and this is often called pushing onto the array.

; list = $list 9 10
; var list
list = 0 1 2 3 4 5 6 7 8 9 10

pop

Removing the last item of an array is no longer trivial, so we have a helper that takes the last item and puts it at the front, letting us shift the final value out.

; (a list) = <={ %shift $list }
; var a list
a = 10
list = 0 1 2 3 4 5 6 7 8 9

The code for the %shift fragment can be seen here:

fn %shift {|l|
    let (n = $#l; m) {
        if {~ $n 0 1} {
            # nothing to do, do nothing
            result $l
        } else if {~ $n 2} {
            result $l(2 1)
        } else {
            m = `($n - 1)
            result $l($n 1 ... $m)
        }
    }
}

Distributed FS Research

These are just some notes of things I want to look into:

Full-fledged file-systems:

CAS:

  • Venti
  • BlobIt (Built-on BookKeeper)

Parts or unknown:

Not interested:

  • GlusterFS – Needs to much initial setup and configuration, want a JBOD system
  • CephFS – See GlusterFS
  • Lizard – Looks like a poorly maintained fork of MooseFS

Functions in XS that need two lists

One of the more curious things about XS is how nice it is to use when working with a list of things. Bash’s awkward syntax for its arrays is one of the reasons I stopped using it.
However, that is not to say that XS is without fault.

; fn twoLists {|A B|
    echo A \= $A
    echo B \= $B
}
; twoLists (1 2 3) (A B C)
A = 1
B = 2 3 A B C
; X = 1 2 3; Y = A B C
; twoLists $X $Y
A = 1
B = 2 3 A B C

When you give an XS fragment two lists, it passes those as a single list, and XS does not allow nested lists, so the separation between the two is completely removed.
However, XS is at least inspired by functional languages, and you can pass a fragment around, which is effectively a function. You can even have a fragment return a fragment, and closure behavior occurs.

My initial idea on how to solve this was to create a fragment that returns a closure that returns the list. Clojure has such a function, it calls it constantly.

fn %constantly {|list|
    result { result $list }
}

This does mean that the fragment that would need to use such behavior must always use it, so you have behavior like the following:

fn twoLists {|fnA fnB|
    let (A = <=fnA; B = <=fnB) {
        echo A \= $A
        echo B \= $B
    }
}

But at least that behavior is correct:

; twoLists <={ %constantly 1 2 3 } <={ %constantly A B C }
A = 1 2 3
B = A B C

It is however fairly unwieldly to make use of. However, functional languages also offer something called currying, which is something I’ve struggled to see the use of (XS and Javascript being the only languages where I do anything “functional”). So what did I actually need this for? List comparisons. I have two lists, and I want to get a list of what’s common between them. Using %constantly the fragment ends up looking like:

fn %common {|fn-p1 fn-p2|
    let (list = <=p1; res = ()) {
        for i <=p2 {
            if {~ $i $list} {
                res = $res $i
            }
        }
        result $res
    }
}

And calling it is the ugly echo <={ %common <={ %constantly 1 2 3 } <={ %constantly 2 3 4 } }, but if we rewrite it in a way that implements an emulated currying (I am aware that one could argue this isn’t currying):

fn %common {|listA|
    result {|listB|
        let (res = ()) {
            for i $listA {
                if {~ $i $listB} {
                    res = $res $i
                }
            }
            result $res
        }
    }
}

This new version looks a little cleaner in my opinion, and the calling behavior is both cleaner and potentially more useful!

echo <={ <={ %common 1 2 3 } 2 3 4 }

Obviously there’s a limit to how clean it can be with XS, but the fact that %common now returns a closure that is effectively “return what matches with my stored list” means I can save the fragment, then call it against multiple lists if needed.

MusicPD Soft Ramping Alarm Clock

For a long while, I’ve had a disdain for the abrubt awakening caused by traditional alarm clocks. I wanted something that was still going to wake me, but wouldn’t cause my heart to be racing first thing in the morning. So I wrote some crontab entries to help:

0	5	*	*	*	mpc vol 10
0	5	*	*	*	mpc play
5	5	*	*	*	mpc vol 20
5	5	*	*	*	mpc play
10	5	*	*	*	mpc vol 30
10	5	*	*	*	mpc play
15	5	*	*	*	mpc vol 40
15	5	*	*	*	mpc play
20	5	*	*	*	mpc vol 50
20	5	*	*	*	mpc play
25	5	*	*	*	mpc vol 60
25	5	*	*	*	mpc play
30	5	*	*	*	mpc vol 70
30	5	*	*	*	mpc play
35	5	*	*	*	mpc vol 80
35	5	*	*	*	mpc play
40	5	*	*	*	mpc vol 90
40	5	*	*	*	mpc play

For those not used to reading crontabs, at 0500 set the volume to 10%, then start playing music, and every five minutes increase the volume by 10%, play the music again (as it might have been turned off), and continue to the normal listening volume (90% in this case).

Of course, writing that by hand is a pain, so I have an xs script generate that chunk of my crontab for me:

# User configurable values
max_vol	= 90
min_vol	= 0
hour	= 5
min_step	= 5

#####

# State variables
vol	= $min_vol
min	= 0

while {$vol :lt $max_vol} {
    # Increment volume
    vol	= `($vol + 10)
    cron	= $min $hour \* \* \*

    # %flatten takes a string and a list, it joins the list with the string
    echo <={ %flatten \t $cron 'mpc vol '^$vol }
    echo <={ %flatten \t $cron 'mpc play' }

    # Increment minutes
    min = `($min + $min_step)
}

REPL Adventures: Web Scraping With The Web Console

Ever since I began mucking around with node.js I’ve been deeply enamoured with the REPL. For the first little while I had thought it was my first experience with a REPL (turns out bash was, I just didn’t realize it yet). It wasn’t long before I learned the absolute power a REPL provides to a developer (assuming you’re willing to cram a ton of logic into one line). But that’s not the REPL we are going to discuss today. Instead it’s the Web Console, and we’re going to use it to scrape a website.

Now, obviously, if we’re wanting to scrape the same website repeatedly, the Web Console is not the proper tool, but it still can be invaluable to get one’s logic straight.

I wanted to convert an image gallery into a CSV file. This image gallery happened to be hand rolled (maybe a script made it, it was very nice and uniform), but the key factor was it was all on one page. The HTML looked like this:

<div class="gallery">
  <a target="_blank" href="p/f/20211217_ThreeWiseMen.jpg">
    <img src="p/t/20211217_ThreeWiseMen.jpg" alt="Three Wise Men">
  </a>
  <div class="desc">Dec. 17th. Three Wise Men</div>
</div>

<div class="gallery">
  <a target="_blank" href="p/f/20211217_ThreeMoreWiseMen.jpg">
    <img src="p/t/20211217_ThreeMoreWiseMen.jpg" alt="Three More Wise Men">
  </a>
  <div class="desc">Dec. 17th. Three More Wise Men</div>
</div>

So, the first thing to do is grab the first entry:

> $('.gallery')
<div class="gallery">

We can see a few things here, my input is the $('.gallery') call, and the result of that call is printed on the next line. Not shown here is firefox’s gear icon, which allows us to interact with the result as if it were in the Inspector tab of the Web Developer’s Toolbox. This can be helpful, but we won’t need it today.

> $('.gallery').innerHTML
'
  <a target="_blank" href="p/f/20211217_ThreeWiseMen.jpg">
    <img src="p/t/20211217_ThreeWiseMen.jpg" alt="Three Wise Men">
  </a>
  <div class="desc">Dec. 17th. Three Wise Men</div>
'

Okay, here we get a better view of what we’re actually after. We want four bits of information: The p/f URL, the p/t URL, the alt-text, and the description. I quickly test a theory:

> $('div', $('.gallery')).innerHTML
'Dec. 17th. Three Wise Men'

Great! You can see here that $() can take a second argument, an HTMLNode to be scanned instead of document. The three other bits of information can be easily gathered:

> n = $('.gallery'); [$('a', n).href, $('img', n).src, $('img', n).alt, $('div', n).innerHTML]
Array(4) [ "http://www.nicesparks.com/p/f/20211217_ThreeWiseMen.jpg", "http://www.nicesparks.com/p/t/20211217_ThreeWiseMen.jpg", "Three Wise Men", "Dec. 17th. Three Wise Men" ]

One thing to note: the last expression is the one that will be printed. In this case, it’s that array line, which I use to print all four values I’m after at once. Now we’re going to want to iterate over all of the elements:

> a = []; g = $$('.gallery'); for (var n of g) { }; a
Array []

You’ll notice I’m missing the break-out step of the last line, this is a quick sanity check against my syntax before it gets gnarly. You always want to know that your syntax is sane before things get complicated.

> a = []; g = $$('.gallery'); for (var n of g) { a.push([$('a', n).href, $('img', n).src, $('img', n).alt, $('div', n).innerHTML]) }; a
Uncaught TypeError: can't access property "href", $(...) is null

That… was not expected. Do a quick peek:

> a.length
120
> g[120].innerHTML
"
  ==================== VACATION! ==============
"

Ahh. Okay, that explains that. We don’t really care about that to be honest, so we’ll just swallow some errors. But, we might want to see what else goes wrong.

> f = []; a = []; g = $$('.gallery'); for (var n of g) { try { a.push([$('a', n).href, $('img', n).src, $('img', n).alt, $('div', n).innerHTML]) } catch (e) { f.push(n.innerHTML) } }; a 
Array(409) [ (4) […], (4) […], (4) […], (4) […], (4) […], (4) […], (4) […], (4) […], (4) […], (4) […], … ]
> f.length
2
> f
Array [ "\n  ==================== VACATION! ==============\n", "\n  ==================== Old stuff ==============\n" ]

Okay, there weren’t too many of those informational blocks. But we now have our results in a! So let’s get that into something usable:

> copy(JSON.stringify(a, null, '\t'))
String was copied to clipboard. 

The copy() function is unique to the Web Console, (as are $() and $$()), this will copy a block of text into our clipboard. We use the extra arguments of JSON.stringify() to pretty-print the resulting JSON. The second argument is for a transformation function, which we don’t need, so it is set to NULL, the third argument is the indentation character, which is a tab. With our JSON in the clipboard we quickly pop a shell:

; xsel > gallery.json
; head gallery.json
[
    [
        "http://www.nicesparks.com/p/f/20211217_ThreeWiseMen.jpg",
        "http://www.nicesparks.com/p/t/20211217_ThreeWiseMen.jpg",
        "Three Wise Men",
        "Dec. 17th. Three Wise Men"
    ],
    [
        "http://www.nicesparks.com/p/f/20211217_ThreeMoreWiseMen.jpg",
        "http://www.nicesparks.com/p/t/20211217_ThreeMoreWiseMen.jpg",

Awesome! The xsel command just dumps the clipboard to standard out, and we write it to “gallery.json”, then we just check what we have with head which prints the first ten lines of a file. A nice quick web-scraping!