String formatting and printing

Requires GCC 4.3 or better for variadic templates…

/*
format.hpp: A Simple Formatting And Printing Library
====================================================

Examples
--------

    // Returns std::string("5").
    to_str(5);
    
    // Prints "counting 1 two 3\n".
    print("counting % % %", 1, "two", 3.0);

    // Prints to stderr "error: argument 3 invalid\n".
    print_err("error: argument % invalid", 3);
    
    // Prints to std::ostream out "<p>paragraph text</p>\n".
    print(out, "<p>%</p>", "paragraph text");

    // Returns std::string("1 2 3").
    format("% % %", 1, 2, 3);

Adding New Formattable Types
----------------------------

format.hpp uses the << operator to format objects, just like std::cout. To
add support for a new object, just add a new overload to the operator.

Example
-------

    #include <ostream>

    std::ostream& operator <<(ostream& out, const my_type& val)
    {
        // Format val however you want here...
        out << val.to_string();
        return out;
    }

Notes
-----

Requires C++0x variadic templates. Based on example from:
   
<http://www.generic-programming.org/~dgregor/cpp/variadic-templates.html>
*/

#include <string>
#include <sstream>
#include <iostream>
#include <ostream>
#include <stdexcept>

template <typename T>
std::string to_str(T val)
{
    std::stringstream out;
    out << val;
    return out.str();
}

inline void print(std::ostream& out, const char* s) {
    while (*s) {
        if (*s == '%' && *++s != '%')
            throw std::logic_error("invalid format string: "
                                   "missing arguments");
        out << *s++;
    }
    out << std::endl;
}

inline void print(const char* s) {
    print(std::cout, s);
}

inline void print_err(const char* s) {
    print(std::cerr, s);
}

template<typename T, typename... Args>
void print(std::ostream& out,
           const char* s,
           const T& value,
           const Args&... args) {
    while (*s) {
        if (*s == '%' && *++s != '%') {
            out << value;
            print(out, s, args...);
            return;
        }
        out << *s++;
    }
    throw std::logic_error("too many arguments for format string");
}

template<typename T, typename... Args>
void print(const char* s,
           const T& value,
           const Args&... args) {
    print(std::cout, s, value, args...);
}

template<typename T, typename... Args>
void print_err(const char* s,
               const T& value,
               const Args&... args) {
    print(std::cerr, s, value, args...);
}

template<typename T, typename... Args>
std::string format(const char* s,
                   const T& value,
                   const Args&... args) {
    std::stringstream out;
    print(std::cerr, s, value, args...);
    return out.str();
}

C++ to_str

Since I’ve had to write this for the millionth time…

#include <string>
#include <sstream>

using std::string;
using std::stringstream;

template <typename T>
string to_str(T val)
{
    stringstream out;
    out << val;
    return out.str();
}

Not particularly efficient, but convenient.

Hello World in x86, C, Java, and Python

I spent some time this weekend refreshing my knowledge of x86. I thought I’d put up something comparing Hello World in assembly to other languages, to demonstrate some thoughts I’ve had about the development of programming languages over time.

Hello World in x86: (16 lines)

NOTE: this is gas syntax, not MASM, and works on x86 Linux.

	.section .data
hello_world:
	.ascii "hello world!\n"
hello_world_end:
	.section .text
	.global _start
_start:
	movl $4, %eax     # Set the system call number for write. See "man 2 write".
	movl $1, %ebx     # fd = 1 i.e. stdout
	movl $hello_world, %ecx          # buf = hello_world
	movl $hello_world_end, %edx
	subl $hello_world, %edx          # count = hello_world_end - hello_world
	int $0x80         # invoke the write syscall
	movl $1, %eax     # Set the syscall number for exit. See "man 2 exit".
	movl $0, %ebx     # status = 0
	int $0x80         # Invoke exit.

Hello World in C: (4 lines)

#include <stdio.h>
int main() {
  puts("Hello World");
}

Hello World in Java: (5 lines)

public class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello, World");
    }
}

Hello World in Python: (1 line)

print "Hello World!"

Analysis:

How do these compare? Mostly you see reducing levels of complexity and line count over time, except with a bump in the road when we run into Java.

The assembly example is both long, and potentially somewhat difficult to read, which is why I added comments. If I added a few more lines, and used symbolic constants, it would probably be more readable, but I’m trying to keep the examples minimal in line count.

The C example is reasonably terse for a language people deride as being low level, or a “portable assembly language”

The Java example is kind of sad because the extra verbosity it introduces has nothing to do with solving the problem at hand, and is not inherent in the underlying hardware. Rather, the extra keywords and the requirement that all code is in a class are purely burdens imposed by the language designer.

The Python example really demonstrates what all the other languages are doing wrong. The entire program is a terse and clear statement of the programs purpose, with no low level details or unnecessary verbiage. It is programming according to Strunk and White.

Stack Traces for C++ Exceptions

Someone at work mentioned that it’s troublesome how C++ doesn’t provide stacktraces with its exceptions like… every other language, thus making C++ exceptions more tricky to debug. Well, on Linux it turns out this is pretty easy to fix:

#include <string>
#include <iostream>
#include "backtrace.h"
#include "TraceableException.h"

using std::string;
using std::cout;
using std::endl;

namespace SomeNamespace
{
	void IntermediaryFunc(int somearg)
	{
		throw TraceableException(BACKTRACE());
	}

	void IntermediaryFunc2(double somearg)
	{
		throw TraceableException(BACKTRACE());
	}
}

int main()
{
	cout << "first exception will be caught and will perform an explicit "
		<< "backtrace..." << endl;

	try
	{
		SomeNamespace::IntermediaryFunc(123);
	} catch (const TraceableException& te)
	{
		cout << te.GetBacktrace() << endl;
	}

	cout << "second exception will not be caught but should still perform a "
		<< "backtrace..." << endl;

	SomeNamespace::IntermediaryFunc2(123.456);

	return 0;
}

The output of the above program is:


backtrace$ ./backtrace
first exception will be caught and will perform explicit a backtrace...
backtrace for main.cpp:14 void SomeNamespace::IntermediaryFunc(int)
./backtrace:SomeNamespace::IntermediaryFunc(int)
./backtrace:main
/lib/tls/i686/cmov/libc.so.6:__libc_start_main
./backtrace:

second exception will not be caught but should still perform a backtrace...
backtrace for main.cpp:19 void SomeNamespace::IntermediaryFunc2(double)
./backtrace:SomeNamespace::IntermediaryFunc2(double)
./backtrace:main
/lib/tls/i686/cmov/libc.so.6:__libc_start_main
./backtrace:

Aborted

Of course, what I haven’t shown is how do I implement BACKTRACE() so that it retrieves a backtrace, and how do I implement TraceableException so that it is smart enough to print its backtrace without being caught in a try catch block.

The short answer is:

  1. 1. libc’s backtrace() to walk the stack and get instruction pointers.
  2. 2. libdl’s dladdr() to translate instruction pointers into file/symbol information.
  3. 3. libstdc++’s abi::__cxa_demangle() to turn mangled C++ symbols into readable signatures like SomeNamespace::IntermediaryFunc2(double)
  4. 4. std::set_handler to create a handler to print the backtrace if a TraceableException isnt’ caught.
  5. 5. __attribute__((noinline)) on functions used to generate the backtrace, so I can reliably remove their frames from the output displayed to the user without worrying about inlining tricking me into removing the wrong frame.

NOTE: All of these functions are available on OSX as well, even dladdr since 10.4 or maybe 10.3

The rest is left as an exercise to the reader for now, but I will probably post my code later.

If you want line numbers for intermediary frames, or symbols for static functions, you need to parse the DWARF debugging data. That sounds like it might be more trouble than it is worth, so I haven’t bothered with it for now.

I’ll put a couple of design details about the BACKTRACE macro here:


struct BacktraceImpl;

struct Backtrace
{
	__attribute__((noinline)) Backtrace(const char* file,
							  const char* callingFunc,
							  int line);
	std::string ToString() const;

private:
	boost::shared_ptr<BacktraceImpl> m_impl;
};

#define BACKTRACE() Backtrace(__FILE__, __PRETTY_FUNCTION__, __LINE__)

std::ostream& operator <<(std::ostream& os, const Backtrace& backtrace);

Backtrace is an immutable type that defers most processing until ToString() is called. I do it this way so that using backtraces in exceptions doesn’t incur symbol lookup cost if I don’t actually need to print the backtrace. Backtrace also gives me the option of adding lower level programmatic access to the backtrace later, e.g. I could add a frame iterator.

Anyway, I think the moral of this story is that you can make C++ do anything, but it’s a lot less work just to use Python where this stuff works out of the box.

Installing python 2.6 or 3.0 without breaking Ubuntu

Various applications on Ubuntu and other linux distros use and depend on a specific python version, usually python 2.5. I’ve seen a number of blog and forum posts with people trying to do the upgrade to 2.6, only to realize that they just broke half the applications on their system.

Here’s how to easily install a newer and incompatible version of python without breaking your distro:

  1. Get all the dependencies. There’s an easy way to do this on debian based Get the python2.6 dependencies from python2.5 like so:
    apt-get build-dep python2.5

    Also for python 3.0:
    apt-get build-dep python3.0

    NOTE: the python3.0 version in the ubuntu 8.10 repository is a release candidate, not the final version.

  2. Get the source distribution of the version of python you want from python.org
  3. Decompress the package:
    tar xvf Python-x.y.tar.gzip
  4. cd into the source directory then:

    ./configure
    make
    make altinstall
    

    The key difference here is that altinstall is used instead of install. Altinstall will not modify your existing “default” python executed by scripts bedining with “#/usr/bin/env python.” Instead you must specify your new version explicitly with “#! /usr/bin/env python2.6″

Note that when compiling python 3.0 on Ubuntu you may run into a problem where the dbm module cannot compile. If you do so, install this patch which was originally from this thread.

To apply the patch cd into the python3.0 directory and run:

patch -p0 < dbm.diff

shared_ptr and make_shared(args)

This doesn’t appear to actually be documented as part of boost, so I’m making a note of it here.

You should never create a boost shared_ptr with shared_ptr<my_type>(new my_type(...)) e.g.:

f(shared_ptr<my_type>(new my_type(...)), other_type(), ...);

This is both inefficient and dangerous.

It is inefficient because the shared_ptr must then allocate a separate object to do reference counting rather than storing the counter the same memory blob as your newed object. This causes two memory allocations instead of one, and is worse in terms of locality as well because the reference count isn’t stored near the object.

It is dangerous because if the constructor for one of the other arguments to f throws an exception, then it’s possible that a MyType will be allocated but never assigned to shared_ptr. Since you count on shared_ptr to deallocate the new instance of my_type, the program will leak. This is a well known problem with c++ that was originally discussed in this paper. The consequences of the interaction between C++ memory management and the C++ exception mechanism are widely believed to break the composibility of C++ functions. If f() returns a pointer and g(x, y) accepts a pointer for x then:

g(new my_type(...), h());

is often thought to be impossible to do correctly if h might throw. Instead the accepted best practice is to break the expression up instead a number of statements which immediately assign pointers to smart pointers that use RAII.

// Don't ever actually use std::auto_ptr. Use boost::scoped_ptr in c++03, and std::unique_ptr in c++09.
scoped_ptr<my_type> my_ptr(new my_type(...));
some_type x = h();
g(my_ptr.get(), x);

However, this is a bit harder to follow. What we really want is a copyable smart pointer that can be used in the signature of g. This we get from boost::shared_ptr in c++03, or std::unique_ptr in c++09 unique_ptr is more efficient and prefereble (I’ll discuss shared_ptr since c++09 isn’t standard yet).

shared_ptr<my_type> my_ptr(new my_type(...));
g(my_ptr, h());

This is a little better. Since we don’t have to ever turn the shared_ptr back into a raw pointer, we don’t have to worry about h() throwing after my_ptr.get() releases the raw pointer from memory management. However, we can’t put everything on one expression because if we did, h() could throw after my_type is allocated, but before it is assigned to a shared_ptr. What we really want is a helper function that a both allocates a my_type and assigns it to a shared_ptr in one go. We could write this helper ourselves, but it would be tedious to provide one for every type we would like to hold in a shared_ptr. Thus, recent versions of shared_ptr include a make_shared<T> which is a generic function that aliases the constructor of T and returns a shared_ptr<T>

g(make_shared<my_type>(/*construction args here*/), h());

is now perfectly safe. Not only that, but it is more efficient than the example where we constructed a shared_ptr with a newed my_type. make_shared is smart enough to allocate a single blob of memory for the referent and the reference count, cutting the number of allocations related to your use of shared_ptr’s in half and increasing locality, thus solving the inefficiency complaint I talked about earlier.

The one complaint I have about make_shared is that it is documented nowhere in boost, nor is it included in the boost/shared_ptr.hpp header. Instead, you need to include the boost/make_shared.hpp header. I only found out about make_shared.hpp after I’d reasoned that something like it needed to exist to solve the counted body problem, and wasted time writing my own. This is exactly why good documentation is so important!

I have another complaint against shared_ptr in general, which that it is overkill in 99% of all memory management situations and inefficient. Almost always a pointer is not shared, and does not need a reference count and the associated book keeping overhead. Instead, the lifetime of a heap allocated object is usually attached to some scope, or to the lifetime of another object. I just need something like a scoped_ptr that knows how to transfer ownership in an exception safe way. auto_ptr tried to do this, but failed because C++03 doesn’t really support move semantics, and sticking an auto_ptr in a standard container (which is technically illegal) will lead to all kinds of undefined behavior. In C++09 we get move semantics and a container safe unique_ptr.

unique_ptr isn’t talked about much when discussing C++09. Instead I see a lot of articles out there lauding the inclusion of shared_ptr in the standard, whereas I see shared_ptr as a necessary evil of C++03. I think that in practice unique_ptr will become the new default pointer type, as it offers the efficiency of the raw pointer, but with exception safety.

unique_ptr presents a nice way of annotating in the source rather than the comments what the memory management semantics of a pointer parameter or pointer return value are. This says:

void f(unique_ptr<some_type> arg);

that f takes ownership of a pointer to some_type, and that the caller doesn’t have to worry about deleting it. Whereas this says:

unique_ptr<some_type> g();

that g() has relinquished ownership of a pointer to some type, and that the caller is responsible for deleting the return value. Of course, the nice thing about unique_ptr, is that it will handle that deletion for you as well

Safari Books Online

I’ve been using Safari Books Online pretty heavily at work. It’s a great tool, but there are definitely problems that make it not nearly as useful of a tool as it could be.

Awesome stuff:

  • With a corporate subscription, I can avoid having to ask my manager to expense books, which is nice since I read enough that if I had to pay $50 for a book, it would come out to a pretty big sum on some months.
  • When I want to start reading, I can do so immediately, instead of waiting weeks for amazon to get the book to me.
Awful stuff:
  • The browser based reader is aweful. You can only read one page at a time, and you can’t rapidly flip back and forth between sections, or navigate in any way at a reasonable speed. They need to both increase the serving capacity of safaribooksonline.com so that I can get responses quicker, and build a more responsive UI. They are essentially selling a service that needs to be used as interactively as a desktop application, but only their interface feels as slow and clunky as your average web 2.0 site.
  • Downloads, which I think are only available to corporate users, are limited per month, and have to be downloaded chapter by chapter. I understand limiting the number of downloads per month to prevent piracy. However, 5 seems low. Do people really only read 5 chapters a month?
  • There is no option that I can see to download an entire book, even if you have enough tokens. This is annoying because I don’t want to have to close and open lot sof PDF files when navigating. I want to have nice hierarchical PDF bookmarks of the table of contents like other ebooks do. To be fair, they have that *within* the chapter*, just not *between* chapters.

The *single* biggest problem with safari books online, too big to be dealt with in a bullet point, is that it has too many O’Rielly books, but is missing key books from a number of other publishers.  O’Rielly Books is known for being a proponent of open source, but it is not known for actually putting out good books. Most O’Rielly books are either useless fluff, or stuff you could have gotten from online documentation.

Thankfully safaribooks online also covers a lot of addison-wesley, which are the gold standard in computer books. Still, a number of key books I’ve wanted to read have been missing, so I’ve had to get them from Amazon.

In particular, they seem to be missing most textbooks discussing computer science topics, and instead have tons of buzzword books, and books on a particular technology (again, stuff I can usually pick up faster using online documentation).

Looking over my bookshelf, I can’t help but notice that some of the better books I’ve read/am reading aren’t available through safari::

  1. The GoF Design Patterns book.
  2. Pattern Oriented Software Architecture.
  3. The Art of Computer Programming.
  4. Artificial Intelligence: A modern approach.

Also, just an idea, If Safari really wants to add value to their service, I’d like to see some reformatted RFCs. I don’t want any content subtracted or added, my only problem with RFC’s is that they are published in ASCII and not html. Just format the text more nicely than can be done in ASCII, add some hyperlinks for navigation, and replace the ascii art with nice graphical diagrams, and they would have something that, if not worth paying for by itself, would make me feel better about the whole deal.

Protocol Buffers

Google has finally released protocol buffers to the world at large. I used version 1 of protocol buffers a lot last summer at Google, and they are totally awesome.

Protocol buffers are a kind of meta file format and meta file format parser if you will. Protocol buffers are a dead easy means of specifying a highly efficient and extensible file format and then automatically generating the code in your favorite language that reads and writes that format. The link has a basic example.

Think of all those tasks that you used XML for, even though you knew it was terribly inefficient *cough* web services *cough*, just because writing a custom binary format and the code to read and write it wouldn’t be worth the effort. Protocol buffers provide a more efficient approach that is just as easy, if not easier, to use. Protocol buffers dump to human readable text for debugging, and the generated parser is significantly easier to use, and more type safe than an XML parser since the parser is generated for *your* data layout specifically.

Yay for google style guide

Google published their C++ style guide for the whole world to see. Aside from advocating against the use of exceptions, it is a pretty good primer on how to write clean C++ code. Google probably has one of the cleaner and most consistent code bases.

Other exported google internal stuff to look at is gflags. A rediculously easy way to turn command line options into typed variables. However, it uses static registration, an idiom I’ve become very disenchanted with over time. The listing of options gets enourmous because all linked in libraries also show their options, even if your tools will provide sensible defaults to the library. Still, probably the easiest way to handle command line options for smaller tools.

Japan Pictures 4

I was in kofu for a couple of days with no internet access so I have a backlog.

Kofu was super nice. I stayed at a very expensive and totally awesome ryokan there for two nights. I’d been hiking through tokyo all week, so I mostly hung out at the ryokan (with a side trip to kofu castle). They had this incredible traditional japanese dinner made from their own kitchen by these old ladies that worked there. It included all sorts of differetn stuff: a single shrimp maybe 7 inches long, really good sashimi, like 5 kinds of meat, fried fish, different kinds of mushrooms, fresh fruit, tofu, and stuff I’d never seen before and don’t know what to call. I wish I’d taken picture.

Kofu is nestled in chubu’s mountains, which are absolutely bueatiful. It is probably only a little bit smaller than seattle population wise, but has a small town feel at least on the outskirts.

It takes too long to add subtitles to images on wordpress, so here’s a quick summary in rough order of the pictures below:

1. Looked at anime crap in Akihabara. They had life sized dolls of asuka that I’m sure brian will drool over. Also astro boy! Atomu!

2. Notice that black guy with a huge pachinko ball afro. Inside this establishment are many people throwing their money away.

3. Manga.

4. The wise and inscrutable Colonel Sanders. He is worshipped as a household god in japan.

5. Electronics crap from akihabara.

6. The advertisement on the front of what I *think* was a house of ill repute right in Akihabara! When I took this photograph, the japanese guys studying the ad jumped out of the way *real quick*.

7. Various statues from the western art museum in ueno.

8. Shrine stuff from Ueno.

9. The night stuff with lots of glowing signs is from Roppongi, and includes Roppongi crossing. Roppingi is basically party central for japan.

10. No!!!!!!!!! Drugs

11.  The super azusa to kofu was awesome, so It took a picture. I got a reserved seat (equivalent to first class on an airline, but since it’s a train it doesn’t cost that much). Almost my entire cabin was empty, I had plenty of leg room, and a chair I could lean way back. Plus, they sell snacks during the trip. This is way different than the tokyo metro trains that are usually standing room only. I’ve been on the trains during the hours when people nearly need to get packed in, and it is *not* fun.

12. Lots of moutains from the train, as well as rice paddies. There are whole towns full of houses where everyone’s backyard is a rice paddy that the family that lives there maintains part time. I’m not sure how this works.

13. Kofu castle was alright, but the main attraction was that after I hiked to the top I got an awesome view of all the surrounding mountains, city, and countryside. I regret I didn’t get to go hiking up the mountains, but it was raining like crazy on and off, and I didn’t want to get caught out there in my short sleaved shirts.

←Older