Go forward in time to May 2006.
Today someone went into #gtk+ asking something like, "I use two GtkRuler widgets and call gtk_ruler_set_range() every time I get a motion event. The program uses almost 100% CPU while I move the mouse".
Let's try to reproduce the problem. Try this program and see that indeed it consumes almost 100% CPU if you keep moving the mouse.
Every second, the program prints a counter of how many updates it has done to the rulers. By moving my mouse quickly, I get about 110 updates per second. My display doesn't refresh itself nearly as fast, so the program is doing a lot of wasted work.
Now try this second version. It uses a timer to send at most 60 updates per second to the rulers, regardless of how many events we get per second. CPU consumption goes down to about 15%, and the rulers don't look jerky at all. It prints this output:
events: 0 paints: 0 events: 70 paints: 23 events: 193 paints: 65 events: 318 paints: 106 events: 444 paints: 147 events: 568 paints: 188 events: 693 paints: 230 events: 791 paints: 263 events: 915 paints: 305 events: 1040 paints: 346 events: 1165 paints: 388 events: 1290 paints: 429 events: 1414 paints: 470 events: 1540 paints: 512 events: 1659 paints: 554
Should GtkRuler do this kind of throttling on its own? Maybe, since it is commonly updated from motion event handlers.
Summary: don't repaint more often than your display can handle it. You'll just waste CPU time.
Footprint of non-shared dirty data from libraries
There is an interesting discussion in performance-list about how much unshared data we have in libraries.
I wrote a script to parse /proc/PID/smaps for all the processes in the system (later I slapped my forehead when people showed me I had reinvented the wheel — such a parser already exists in the Python bindings for gtop). It adds up the shared_dirty and private_dirty values so that if 10 processes use libfoo.so, and libfoo.so has 24 KB of dirty data in each process, then we get a total value of 240 KB of dirty data.
What is this about? The kernel lets processes share most of the data from .so libraries. However, there is some data that cannot be shared and is unique to each process: relocation data (see Ulrich's famous paper), and pre-initialized data that was not marked const and so it cannot be shared. For example, if your program or library has this:
static int lookup_table[] = { 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 };
Then your binary will have an unshared, preinitialized block of data which is the size of 10 ints. In particular, the compiler will put this in the .data section of your binary, which is not shared among processes. If your program never modifies that table, you can mark it const: this will make the compiler put it in the .rodata section (read only data), which is shared among processes, using less memory overall.
This is part of the output of the script. It shows the libraries whose private_dirty data is more than 200 KB.
/opt/gnome/lib/libbonoboui-2.so.0.0.0: 216 KB (spread among 36 mappings) /lib/libresolv-2.4.so: 224 KB (spread among 56 mappings) /lib/libncurses.so.5.5: 236 KB (spread among 14 mappings) /usr/lib/libaudiofile.so.0.0.2: 240 KB (spread among 40 mappings) /opt/gnome/lib/libgobject-2.0.so.0.800.5: 240 KB (spread among 64 mappings) /usr/lib/libstdc++.so.6.0.8: 260 KB (spread among 39 mappings) /opt/gnome/lib/libgdk-x11-2.0.so.0.800.10: 264 KB (spread among 48 mappings) /usr/lib/libpng12.so.0.1.2.8: 264 KB (spread among 48 mappings) /usr/lib/libfreetype.so.6.3.8: 276 KB (spread among 50 mappings) /opt/gnome/lib/libgconf-2.so.4.1.0: 276 KB (spread among 46 mappings) /lib/libpthread-2.4.so: 288 KB (spread among 72 mappings) /opt/gnome/lib/libgnomeui-2.so.0.1200.0: 288 KB (spread among 36 mappings) /lib/libm-2.4.so: 296 KB (spread among 82 mappings) /opt/gnome/lib/libecal-1.2.so.3.2.8: 300 KB (spread among 10 mappings) /usr/X11R6/lib/libX11.so.6.2: 312 KB (spread among 52 mappings) /usr/lib/libfontconfig.so.1.0.4: 352 KB (spread among 48 mappings) /lib/libdl-2.4.so: 368 KB (spread among 102 mappings) /lib/libnss_nis-2.4.so: 392 KB (spread among 102 mappings) /lib/libnss_compat-2.4.so: 392 KB (spread among 102 mappings) /usr/lib/libssl.so.0.9.8: 400 KB (spread among 50 mappings) /usr/lib/libasound.so.2.0.0: 400 KB (spread among 40 mappings) /lib/libnss_files-2.4.so: 408 KB (spread among 106 mappings) /opt/gnome/lib/libgnomevfs-2.so.0.1200.2: 420 KB (spread among 42 mappings) /lib/libnsl-2.4.so: 528 KB (spread among 136 mappings) /lib/ld-2.4.so: 640 KB (spread among 166 mappings) /opt/gnome/lib/libgtk-x11-2.0.so.0.800.10: 704 KB (spread among 48 mappings) /usr/lib/libxml2.so.2.6.23: 768 KB (spread among 48 mappings) /usr/lib/xulrunner-1.8.0.1/libxul.so: 780 KB (spread among 2 mappings) /opt/gnome/lib/libbonobo-2.so.0.0.0: 880 KB (spread among 44 mappings) /opt/gnome/lib/libORBit-2.so.0.0.0: 960 KB (spread among 48 mappings) /lib/libc-2.4.so: 1256 KB (spread among 249 mappings) /usr/lib/libcrypto.so.0.9.8: 2160 KB (spread among 54 mappings)
This alone amounts to almost 16 MB. My basic desktop was running (i.e. nautilus, panel, etc.) along with Evolution, Epiphany, and Gaim. By adding const here and there we can save at least 16 MB from the footprint of such a basic desktop.
On my machine, all the private_dirty mappings (not just the ones with over 200 KB of data) take up about 25 MB total. This is quite hefty if you think of a low-end machine with 128 MB of RAM.
Update: Andrea Bedini has a patch to constify a bunch of places in libpng. Kick ass!
Update: as David Turner was wise to point out, not all of that 16 MB comes from non-const things. A lot of it comes from relocations performed by the linker. Andrea's patch almost kills all of .data for libpng and moves it to .rodata, but the final writable mapping is only slightly smaller.
Improving our memory debugging infrastructure
The first step in reducing a program's memory consumption is to get a breakdown of how it uses memory. This is analogous to optimizing for speed, where you use a profiler to see which parts of the program take the most time. The best tool to get memory breakdowns for plain old C programs is Valgrind.
To make Valgrind run really well for a GNOME application, you need to do some things first:
Compile your ORBit package with --enable-purify. This makes ORBit clear certain memory areas after using them, so that Valgrind's garbage collector will not think they are full of garbage pointers.
Set a G_DEBUG=gc-friendly environment variable. This makes Glib clear certain memory areas after using them, too.
Set a G_SLICE=always-malloc environment variable. This completely disables the magazine and slab allocator in Glib, and makes it use plain malloc()/free() instead.
The biggest pain is recompiling ORBit, since you can't debug memory issues properly until you have rebuilt it, and you may want to switch back to an untouched version when not doing debugging.
Glib's debugging utilities:
Glib has a public g_parse_debug_string() function which lets you set up debugging flags for your programs. Let's say you are writing a program that supports three debugging flags. You could implement them like this:
enum {
DEBUG_LOG_ACTIONS = 1 << 0,
DEBUG_ENABLE_PROFILING = 1 << 1,
DEBUG_CHECK_SANITY = 1 << 2
};
GDebugKey debug_keys[] = {
{ "log-actions", DEBUG_LOG_ACTIONS },
{ "enable-profiling", DEBUG_ENABLE_PROFILING },
{ "check-sanity", DEBUG_CHECK_SANITY }
};
char *str;
guint debug_flags;
str = g_getenv ("MY_PROGRAM_DEBUG");
if (str != NULL)
debug_flags = g_parse_debug_string (str, debug_keys, G_N_ELEMENTS (debug_keys));
else
debug_flags = 0;
...
if (debug_flags & DEBUG_LOG_ACTIONS)
write_to_logfile ("something happened");
That is, you define a few flags and pick human-readable names for them, and then feed a user-supplied string to g_parse_debug_string(). Normally that string comes from an environment variable. If you want to turn on two of those flags for your program, you would run it like this:
export MY_PROGRAM_DEBUG=log-actions:check_sanity ./my-program
Now, g_parse_debug_string() has a few quirks:
Flags must be separated with a ":" character. If you use MY_PROGRAM_DEBUG=log-actions;check-sanity, the program won't get any flags.
Flags are case-insensitive. However, some flags are defined with hyphens and some with underscores, and g_parse_debug_string() is not forgiving there. If you pass log_actions instead of log-actions, it won't work. This is a problem since Glib defines G_DEBUG keys of different forms: gc-friendly, fatal_warnings, fatal_criticals.
As yet another quirk, the always-malloc flag is for the G_SLICE environment variable, not for the usual G_DEBUG.
The g_mem_gc_friendly flag:
When you run a program that uses Glib with G_DEBUG=gc-friendly, it turns on a global variable called g_mem_gc_friendly. Libraries and your own programs can look at this variable to see if they should do special stuff to make themselves friendlier to garbage collectors like Valgrind.
ORBit:
So, having to recompile ORBit with --enable-purify is painful. It could very well look at the g_mem_gc_friendly variable described above, instead of using its compile-time "#ifdef ORBIT_PURIFY" stuff.
Tasks for volunteers:
Make g_parse_debug_string() be forgiving of hyphens-vs-underscores.
Make ORBit use g_mem_gc_friendly instead of a compile-time option.
Yesterday afternoon, my Thinkpad's display started turning itself off randomly. If I pushed down on the laptop's surface, right below the keyboard, it would work again. Today the display started showing garbage while in X, and vertical stripes on half of the screen while in text mode, rendering it unusable. The rest of the machine still works, and I had a chance to copy all my data to my desktop machine. Fortunately, the laptop is still under warranty; IBM says they'll pick up the laptop for servicing over the next few days, and in theory it won't cost me a cent.
Now that is service, quite different to the experience I had with Dell. When the keyboard of my old Dell laptop suffered from a lethal overdose of chocolate milk, I called Dell, telling them that I was perfectly willing to pay for repairs since my laptop was out of warranty by then. They just ignored me, never routed me to someone who could actually help, and were a huge pain in the ass. Eventually I bought a new keyboard in downtown Mexico City (think Akihabara, just dirtier) and replaced it myself.
Why aren't the world markets all up today? There's a new tinderbox for jhbuild in which your own computer can participate!
Does your graphical application call g_spawn_*() directly? You usually want to use gdk_spawn_*() instead: this will let you specify the appropriate screen in which the child program will be launched.
This is the same class of bug of applications which use gnome_desktop_item_launch() instead of gnome_desktop_item_launch_on_screen().
If you want to be a good citizen, you can improve the documentation for the g_spawn_*() functions to mention this.
Near the end of last year, the GNOME Foundation hired our docs hacker Shaun McCance to write a short guide to give people an overview of the GNOME platform. I'm proud to announce that Shaun did a wonderful job!
Read it online: Overview of the GNOME Platform.
Go backward in time to March 2006.
Federico Mena-Quintero <federico@gnome.org> Thu 2006/Apr/06 14:52:02 CDT