Data Structure Optimization

Let's store some data.

I need to store the combination of a 64 bit integer ID and a one byte bit field. Like this:

type info struct {
  id   int64
  data byte
}

That's all well and good, but I actually need an ordered list of them. Like this:

type list []info

Which wouldn't be so bad if I didn't really need a bunch of them. Like this:

type blob struct {
  infos list
}

var AllTheThings []*blob

It gets rather expensive. The info struct gets padded, so instead of being 8 + 1 bytes, it is actually 16 bytes on a 64 bit machine. An array of them is therefore 43.75% padding. An alternative design is something like this:

type blob struct {
  ids  []int64
  data []byte
}

The small overhead of the extra slice header is quickly outweighed by the win of not having a ton of padding. I lose the type system-supported invariant that each id is associated with exactly one byte of data. The maintenance of that invariant falls on the programmer. Any change to the ids slice must be coordinated with a change to the data slice. But the memory usage wins can be significant. For reference this sort of change dropped my program's heap usage from about 50GB to about 30GB.

Can we do better?

Another thing to look at is the real data that is being stored in the id field. For reasons of consistency it's an int64, but the real data I'm working with fits inside an int16 with quite a bit of headroom. So how about compressing things further:

type blob struct {
  ids16  []int16
  ids32  []int32
  ids64  []int64
  data16 []byte
  data32 []byte
  data64 []byte
}

That's even more code to maintain and ensure that the various bits of data end up in the right slice and get sorted correctly and rearranged as necessary. But this brought my heap down to 16GB. Almost certainly worth the added code complexity.

Incidentally the original form of the struct is still available in my library, with tests to ensure parity between the original and compressed data types. The original format is more convenient to use and works well enough when memory isn't an issue.

But why?

This was done to ensure that all of my data fits in memory. The code has to go through all of the data in order fairly often and it's good to have access to everything in memory rather than trying to save it to disk.

Free Your Mind...

...from Dropbox.

After reading this post by Brian Brewington I got to thinking about the data I store in Dropbox. I have some things that I’ve shared publically, since Dropbox is pretty convenient for that sort of thing. But I also have a few private directories. Particularly my todo list and my journal.

I’ve kept my todo list and journal in plaintext form for a while, synchronized between my computers and my phone over Dropbox. It is convenient, but I don’t really trust Dropbox with my private data.

So I set out to put together an alternative system.

I’m now using git repositories on my computer to store the private things that I want to access on my phone. My passwords are encrypted at rest. Other repositories are just secured by virtue of my computer’s privacy.

I’m using the very functional Working Copy to push and pull and edit repositories on my phone.

I’ve also moved my journaling process off of the seemingly unsupported Editorial app to a new system. I have created a Workflow workflow, a Drafts action, and a set of Pythonista scripts. It is a rather complex setup, but it puts all of my journaling tools right on my phone. I no longer require a script on my computer to generate easy to read HTML files from the contents of my journal.

One limitation is that iOS Safari can not seem to open files in the iPhone file system, so the only way to view the journal in HTML form is within Working Copy. The program does pretty well with that, though. It even works with links between documents. One thing it’s missing is a browser history, but so far that hasn’t been an issue.

Some day soon I’ll clean up the python scripts. They were written in a very exploratory way on a small phone screen, so the code meanders quite a bit.

Have I Been Pwned

Wrote a little command-line utility for checking passwords against the Pwned Passwords API. I'm a bit more comfortable giving my passwords to a tool when I can see and compile the code myself. Putting my many passwords into a form on a website seems like a bad idea. The code can be found at github.com/apiarian/hibp.

This utility is made possible by the mattevans/pwned-passwords library.

Might hook this all up to the pass password management system later.

ScreenSaverView NSBundle

Apparently the NSBundle that is available for macOS screensavers isn't the same as the usual [NSBundle mainBundle]. Instead one needs to use something like this when dealing with the ScreenSaverView:

NSBundle *bundle = [NSBundle bundleForClass:[self class]];
// and then load an image, or something
NSImage *image = [bundle imageForResource:@"gopher"];

Maybe this is documented somewhere, but I seem to have missed it until I came across this helpful comment after much frustration.

Plaintext Journal

Some time ago the journaling app I was using discontinued their current app version and released a new one. The new app disabled the third-party-sychronization functionality and required that all entry sychronization go through their own proprietary platform.

This was unacceptable. And a review of the available journaling platforms at the time didn't inspire much confidence or interest. So I decided to roll my own basic journaling system using Editorial for iOS.

The process involves some small amount of manual directory juggling (YYYY/MM/entries.md). Once in the right shared directory, I run my Journal script to create a journal template complete with timestamp and location (and location emoji!).

Editorial synchronizes the jouurnal directory with dropbox and I can periodically run a simple python script to convert the markdown journal entries into a HTML "website" I can browse on my laptop.

One thing that's missing is an HTML browser on my phone, but I'm more likely to be writing on the phone and reading on my laptop so it hasn't crossed the threshold of annoying-thing-that-must-be-written yet.

Markdown Previews

Say you want to edit file.md in your favorite text editor but want to see a preview of the resulting HTML in Safari. On Mac OS X. Get yourself a new bash shell and run the following commands:

brew install multimarkdown
brew install fswatch

echo '#!/bin/bash
multimarkdown --full "file.md" > \
/tmp/file.html && \
open -g /tmp/file.html' > /tmp/rebuild.sh \
&& chmod +x /tmp/rebuild.sh; \
fswatch -o "file.md" | xargs -n1 /tmp/rebuild.sh

Now every time you save file.md, fswatch will run the rebuild script, passing the file through multimarkdown and handing the resulting HTML file to the open command. The -g flag in open keeps Safari from jumping to the foreground every time the script runs.

Safari has the nice feature that if you click on a link to http://example.com while http://example.com is already loaded in some tab, it will switch to that tab and reload the page. Not sure if Chrome does the same, but there's probably a setting or extension for that.

Note that there's a newline after the first line in the string we're passing to echo.

Who needs a dedicated live previewing markdown editor, anyway?

mailbot.py

Wrote a python3 script that looks for specifically formatted emails with photo attachments and uploads said photos to a web-server accessible folder. The script then replies to the emails with the URL or URLs of the image or images, and deletes the emails from the server. A simple, self-hosted, alternative to existing photo uploading services. The script is available as a gist. At the moment is pretty simplistic and special cased. If I find that mailbot needs to deal with other sorts of messages, I'll probably generalize it.

Copy Current Safari Page Link for Day One

Put together a Safari Service in Automator that generates a nice Markdown link to the current page for the Day One journaling app. It copies the link to the system clipboard and launches Day One. Very handy for quickly writing a note about the current website. Once the service is installed it can be bound to a keyboard shortcut through the Keyboard preference pane in System Preferences. I use Option-D.

Fortnight

The word "fortnight" became less common than the phrase "two weeks" in American English in 1906, but persisted as the more popular term in British English until 1977.

I'm not drawing any particular conclusions here. Just thought it was interesting. I wondered when the word "fortnight", so common in the writings of Jane Austen and her contemporaries, went out of fashion. Take a look at this for reference. Note the 1906 crossing of the green and orange lines. Those represent the two terms in the American English corpus. Also note the crossing of the blue and red lines in 1977. Those lines represent the words in British English publications.

Fortnight seems to still be pretty common in British English. It is used in relation to various fortnightly payments that are popular in Britain and related Commonwealth countries.

I say we make an effort to revive the word in America within the fortnight.

See what I did there?

Hopping Out of the Airlock

EVA is not so simple as jumping in a space suit and heading out the door.

Extra-Vehicular Activity (EVA) on the International Space Station (ISS) is a pretty tricky business. The suits astronauts wear these days take time and require help to put on. But here is a point that is not always considered: the atmosphere inside the space suit is not the same as the one on the ISS.

The air in the habitable compartments of the ISS is kept in a pretty comfortable state. The pressure is about the same as here on Earth at sea level. The gas content is also normal: a mix of Oxygen and Nitrogen (plus some Carbon Dioxide). The space suit is different, though. The pressure is lower, about 1/3 of an atmosphere, and it is pure Oxygen (along with any Carbon Dioxide and water vapor coming from the astronaut). The pressure has to be lower to allow the astronaut inside to move their arms and legs: if it were higher, the suit material would be too stiff to move. The astronaut survives the lower pressure because the actual Oxygen content is pretty much the same as that on Earth. So it isn't like they go up a mountain where the air pressure is low and the Oxygen content is low also. The pressure might be low, but the Oxygen is just as it should be.

Ask any diver what happens when a person quickly goes from high pressure to low pressure: the bends. Not something people would want to experience, especially not in space. There are plenty of other things to worry about up there. And lots of lovely sights to see. So astronauts need a way to gradually transition to the no-Nitrogen low-pressure conditions inside the space suit.

One way to manage the transition involves a process that takes two and a half to four hours out of the astronaut's morning. They breathe pure oxygen for a while before going out, with some vigorous exercise thrown in, to slowly pull the Nitrogen out of their bodies. It is difficult work on top of an already difficult day, and apparently not much fun.[1, 2, 3]

The alternative is to "campout" in the airlock overnight. The atmosphere is gradually adjusted while the astronauts sleep through the night. This way they are ready to head out about an hour sooner than they would otherwise. Imagine shaving an hour off your four hour commute. I'm sure you'd be thrilled too! [4]

Of course, this makes one wonder what would happen in an emergency (I blame that Gravity trailer). It seems that it would be impossible to rush the EVA procedure. The emergency repair that was done on the ISS just before Expedition 35 headed home was in planning for a few days at least. Chris Cassidy and Tom Marshburn went out to fix an ammonia coolant leak. I haven't been able to find anything on rapid emergency response for the ISS beyond hopping in a Soyuz capsule and punching out. That's a cramped, bumpy, but reliable ride back to Earth.

See also: NASA Spacesuit Engineer talks Space with Students on YouTube.

Piwik-StatusBoard Bridge

An easily-customizable Piwik-StatusBoard bridge: GitHub Gist.

Wrote this Django views.py function to bridge my Piwik analytics system to my iPad StatusBoard. It generates a graph of the number of visits to each of my two sites over the past few days. The default is 10, but I actually found that 11 is a nice number. It should be fairly easy to adapt this code to display other piwik based data. It uses the Piwik Python API to communicate with Piwik and Python's own JSON library to process the input and generate the output.

Shuffle Classical Music in iTunes

You can use the grouping field in the iTunes metadata to conveniently shuffle classical music.

iTunes has a few modes of shuffling music. These can be accessed from the Controls⇒Shuffle menu. The old standards are album and song shuffling. There is also grouping shuffling. This seems to shuffle tracks that have no grouping as regular songs. When it encounters music that does have all the same grouping, however, it treats these tracks as albums, and plays them through before jumping to the next thing. Classical music albums often have separate tracks for movements, but also multiple overall pieces (symphonies, for example). All of the tracks in a single symphony on an album can be given the same grouping: "Beethoven: Symphony #1 (LSO)". You are now at liberty to select the classical genre, turn on shuffle, and have a long stream of classical music with the various symphonies and concertos intact.

Other uses of the grouping field may be found around the Internet. The most notable one is using it as a sub-genre for building Smart Playlists.

Counting Digits

A problem came up: how to count the number of digits in an integer? There were three solutions that I could think of: division, strings, and logarithms.

  • You can divide the integer by 10 until you get to zero. The number of divisions is equal to the number of digits.
  • You can convert the number to a string and get the length of the string.
  • You can take the ceiling of the base 10 log of the number.

In order to figure out which of these methods works best, I wrote a short python script. Starting with a 9 digit number, I ran the number through 1,000 iterations of each method, got the average calculation time, added another digit, and repeated indefinitely. The division method started showing significant decreases in speed at about 200 digits. The string method broke down at about 400 digits. The logarithm method ran with no decrease in calculation speed up to 47,387 digits. The process of adding an extra digit to the number took significantly more time than the 1,000 iterations of the digit count. Not wanting to let my computer's all day calculation spree go to waste, I present these findings and the algorithm here. Your mileage may vary, depending on the log implementation.

def numdigits(n):
    return int(math.ceil(math.log(n, 10)))

Core Animation Zooming

At first pass, this seems like a fairly simple problem: How do you zoom to mouse position in Core Animation? I suppose looking at it now, once I've figured it, out it seems quite straightforward.

The problem lies in the fact that I had a complicated layer structure last time I tried to do this. The first time I tried to arrange the layer hierarchy for my sequential image viewing program, I had far too many layers. I had a root layer, then there was a zoom layer, which would theoretically just zoom. Then there was a layer for panning. Finally there was a layer encasing the images, and then the images themselves. This turned out to be a housekeeping nightmare.

The simple solution is to use two layers. The required root layer, and a layer to contain the images. The image layout is done with respect to the images layer. The panning of the images is handled by changing the position of the images layer. The zooming is handled by applying a scaling transform to the images layer. Simple!

There's still a trick to get things to work just the way I wanted. In order to have the layer zoom in and out while centered on the mouse cursor, I redefine the scroll wheel function for the containing window. Here I figure out the coordinates of the mouse point in the layer being zoomed. It does not matter if the coordinates are outside the layer. I then create a new anchor point by dividing the coordinates I just figured out by the width and height of the layer. This gives me a unit length, rather than a pixel length (damn Apple for making things this complicated; I still don't see why they need to have two different coordinate schemes). Finally, I save the frame of the layer I want to zoom, set up a transaction without actions, set the new anchor point, reset the old frame, and end the transaction. I have to save the frame because changing the anchor point moves the layer, so to change the anchor point in place I reset the frame back to what it was. Now I can apply the scaling transform to the layer however I wish and move on with my life.

I wish I had a guide like this back when I was first trying to solve this problem half a year ago, so maybe this will help somebody else. Know a simpler way to do it? Drop me a line!