Go forward in time to April 2008.
Projects for Summer of Code 2008
This year I will mentor at most two of the following four projects:
Making evolution-data-server's calendar smaller and faster (for GNOME). E-d-s was written in the very early days of Evolution, and it needs some profiling love. In this project you will profile the calendar part of e-d-s to make it use less memory, and be faster as well with an optimized query engine.
Reducing memory fragmentation in GNOME (for GNOME). The GNOME platform libraries do many allocations of small objects. Your task will be to do an analysis of memory fragmentation, similar to what was done for Firefox 3.
Making removable hard drives Just Work(tm) (for openSUSE). Currently, when you plug in a blank hard drive through USB, nothing happens. The device node gets created in /dev/sdb or whatever, but you don't see anything happen in your screen. Your task will be to improve things around HAL, PolicyKit, and gnome-volume-manager to detect this situation, and to make the right thing happen.
Reliable unmounting of removable media (for openSUSE). When you want to unmount or eject a removable volume that is being used, you get a meaningless error message. There is no way to know which programs are still using the volume. Your task will be to make GNOME tell you which processes are using the volume, and give you a chance of killing them.
If you are a student, please see GNOME's very nice page on information for students. This will be useful to you even if you choose one of the projects for openSUSE. Don't miss out on this great opportunity to become a well-recognized member of the free software community and win USD 4500!
Philip, you are looking at this from the viewpoint of email. Email, addressbooks, and calendars are pretty different beasts.
Email has very particular access patterns, which are totally different from calendars and addressbooks.
With email, the prevailing model is that you display a usually huge linear list of messages, and even the summarized information can be pretty big. Searching is Nontrivial(tm).
In calendars, you almost always look at a subset of "events which occur in a certain range of dates", by virtue of the calendar's view (monthly/weekly/daily). Practically the only exception to this is when you want to do "show me events which contain $substring"; that's when you actually want to show a linear list of all the events that match.
Evolution still doesn't get this right, by the way. If I'm thinking, "when did I meet with Joe?", I type "Joe" in the search box — but Evolution keeps showing the current monthly/weekly/daily view, and I have to manually fiddle with the date ranges to actually find which days contain the event I'm looking for. Substring searches should switch to the "list view" mode.
Addressbooks are funny, because the access patterns for "my addressbook" and "my company's directory" are very different. Your addressbook contains at most a few hundred entries, and you may be able to deal with it comfortably by scrolling through a linear list, which is sorted by last name. You turn on searching when you don't remember a person's last name, so you type their first name in the search box.
In contrast, your company's directory is potentially huge and you never want to show it as a linear list. You actually want to type a person's name in a search box, and then pick a match from a small list. (Something must be wrong somewhere, as people keep mailing me when they mean my colleague with the beautiful name).
What does this have to do with evolution-data-server? E-d-s was always meant to be a local daemon to store your calendar and addressbook. It's the local Model in a Model/View split for all the processes which want to access your calendar/addressbook data. It was also meant to act as a local cache for remote calendars, which could be slow/huge/etc — that is, for stuff which is far away or not in the same scale as your own little appointments.
I really don't know how much of that data gets duplicated in the daemon's clients, but if it is a lot, then the clients are doing something wrong. For example, a calendar event (VEVENT in iCalendar parlance) contains a lot of administrative crap: a UUID, a sequence number, a stringified timezone identifier, blah blah blah, in addition to the actual human-readable stuff that I can see in my calendar (summary, start/end times, alarms). Here's a complete VEVENT:
BEGIN:VEVENT UID:20070925T174038Z-22251-100-1-25@cacharro DTSTAMP:20070925T174038Z DTSTART:20070926T160000Z DTEND:20070926T170000Z SEQUENCE:2 SUMMARY:openSUSE project meeting #opensuse-project CLASS:PUBLIC TRANSP:OPAQUE CREATED:20070925T174109 LAST-MODIFIED:20070925T174109 BEGIN:VALARM X-EVOLUTION-ALARM-UID:20070925T174109Z-5309-100-1-13@cacharro DESCRIPTION:openSUSE project meeting #opensuse-project ACTION:DISPLAY TRIGGER;VALUE=DURATION;RELATED=START:-PT15M END:VALARM END:VEVENT
That's 484 bytes of data, which is mostly fluff. These structures tend to explode in size when you parse them, but a neurotic calendar could keep the nice-and-small VEVENT string instead of a parsed structure, and re-parse it and re-generate it every time the event changes.
And again: my local calendar, which goes back to 2001, only contains about 500 events, and is 670 KB in size. The craziest calendar I've seen is Michael's, which has a couple thousand events, but that's not even an order of magnitude larger than mine.
Summary: there's no reason for e-d-s or calendar clients to have huge memory requirements; they are just doing something silly. You can keep a whole calendar, in memory, in less space than a fat JPEG from your digicam.
Ross posted about people wanting to write replacements for evolution-data-server, and I agree with him that doing so is a mistake.
When we originally wrote Evolution, a lot of backend-ish things were left in a "we'll fix this later" state. It was more important to have a simple, working backend and a totally awesome frontend GUI, than to have a super-optimized backend and a shitty GUI.
The code in e-d-s is a good example of something that could get a lot of easy love if people simply took the time to understand what is wrong with it in the first place.
Some things that could be done to e-d-s reasonably easily:
Evolution-data-server has a reputation for becoming a very big process with a small resident size. On my box, pmap(1) says that e-d-s has a 14 MB heap with 12 MB resident. That's tiny by today's heap standards. However, there are 8 or so threads with an 8 MB stack each, which is of course not even used. An easy cure for people's fears of the total VM size would be to fix the size of the stack in the threads.
Avoid scanning the entire list of calendar events when someone makes a query. The most fundamental API in the e-d-s calendar is, "give me all the events that fall within $range_of_timestamps". For example, the Evolution calendar's week view will say, "give me all the events for the current week". E-d-s then goes and looks at all the events it knows about, sees if they intersect the specified time range (generating recurrences if needed), and builds a list of results. This makes each query run in O(num_events) time — nobody cleans up their old events, so this degrades pretty quickly over time!
There are various ways to fix this. You could read the calendar.ics file into memory, but instead of storing a dumb linked list of events, you could store an interval tree based on the time range for events. An event that occurs only once has a range of [t1, t1 + duration]. An recurring event that repeats indefinitely has a range of [t1, ∞). If you are careful with the way you organize your data, you can avoid even looking at the memory pages for the event info, and just look at the pages where the interval tree is stored.
My calendar.ics file is 700 KB, and addressbook.db is 400 kb. So, why does e-d-s have a resident heap of 12 MB? "Use a database for the calendar" is total overkill, and less reliable than a simple text file. Someone needs to actually look at how much space e-d-s uses in memory for its various data structures, and then see why they become so big when loaded.
When your bathroom becomes kind of dirty and gross, do you demolish it and build a new bathroom? No, you clean it up. It's a shame that software seems so easy to demolish and rebuild.
I'm surrounded by beautiful women these days.
Pavlov has a great summary of reduction of memory usage in Firefox 3. It would be very interesting to try to reduce (for example) memory fragmentation in our platform libraries.
GtkFileChooser bug week: closing summary
Completion in the file chooser
The new completion code for GtkFileChooserEntry is in GTK+ trunk now! It works much more smoothly than before the bug week:
Instead of being a mishmash of asynchronous calls, the entry is now very careful about completion. There are two cases: autocompletion and explicit completion. Autocompletion is is when the entry automatically inserts-and-selects the common prefix as you type, and happens only in the Open modes. Explicit completion is what you invoke with Tab at any time; it inserts the common prefix and moves the cursor past it.
Completion will not happen unless the correct folder is completely loaded. When some form of completion needs to happen, the entry first parses its input to see if it needs to initiate a folder load. If the folder is already completely loaded, then completion happens immediately. Otherwise, the entry initiates a folder load or waits for the current folder to finish loading.
Tab completion is allowed to happen even if the cursor is not at the end of the entry. Also, Tab completion has a new feedback mechanism. You'll get an Emacs-like tip when there are ambiguities in completion, or no matches, or when you type invalid input. See the screenshot above!
The only remaining thing is to fix the popup suggestion window. The code has some FIXMEs that identify the correct place for the suggestion window to be activated or scrolled. Unfortunately, GtkEntryCompletion doesn't let us do these things easily, and it will need some changes. I'll work on that next.
What's next
I'll be merging all of the bug week's stable patches into a GTK+ package for openSUSE. These patches are all in the gtk-2-12 branch already.
I'd like the completion code to get some more testing before I backport it to GTK+ 2.12. If you can test SVN trunk, it would be greatly appreciated :)
Finally
Thanks to all the people who participated in the bug week! Bryan Yunashko for testing, Michael Meeks for fixing hard-to-find bugs, Will Lachance for adding polish to the file list, Chris Wang for figuring out the default response for dialogs, Carlos Garnacho for adding polish to the path bar, and everyone who tested fixes.
Pretty soon I'll organize a bug week for multiscreen-related bugs. Or a bug fortnight — there are so many bugs there.
I just read through the GNOME Foundation's Annual Report for 2008 (PDF), and it is very nice to see a well-produced, pretty-looking summary of the big things that have been going on in GNOME. I'm flattered to have been asked by the Foundation to write a preface for the report. This preface is more or less a summary of what I blurted out during the closing ceremony of last year's GUADEC. Go read it! :)
GtkFileChooser bug week: day 4
Michael found an interesting race condition in GtkFileSystemGnomeVFS: if two threads request the same folder, and the folder was not loaded already, then one of them won't get as much data from the file system as it requested. This is why the file chooser sometimes doesn't display icons — a bug that had me quite puzzled the few times I saw it.
Will Lachance fixed the display of dates for files which were modified today.
People are getting duplicated entries in the file list when using the Unix backend (especially in Debian/Gentoo). Can someone please try the patch in revision 19678 to see if it fixes things? I committed that the other day for a bug in the Unix backend, but I'm not sure if it helps with the bug about duplicated files.
Chris Wang found that GtkFileChooserDialog required a default response button to be defined so that pressing Enter in the filename entry would work correctly. He sent a patch so that the dialog will detect when it doesn't have a default button defined, and will automatically define one itself. This is why you hit Enter in Brasero's file chooser and nothing happened.
It turns out that other people have full git-svn clones of GTK+ and Glib:
If we find that a good number of maintainers and regular contributors are using Git for their daily work, we may as well just switch :)
GtkFileChooser bug week: day 3
I have Tab completion in the file chooser mostly done now. It seems to work much better than before; Tab completion is predictable and doesn't get in your way, it is faster, and it has nice tooltip-like feedback to tell you what it did (a la Emacs). The bug has the current patch attached, if you'd like to test it.
Now I need to finish the popup suggestion list, hopefully tomorrow with some hacking of GtkEntryCompletion.
People have been mailing me interesting patches or reminders of patches for the file chooser — thanks, guys! I want to integrate these patches tomorrow as well. Please catch me on irc.gnome.org #gtk+ or irc.freenode.net #opensuse-gnome.
You can get the Git repository for the completion patch like this:
git clone http://www.gnome.org/~federico/git/gtk+.git cd gtk+ git checkout origin/bgo314873-filechooser-tab-in-the-middle-of-entry
Git repository with GTK+'s full history
Ever since I discovered the joys of git-svn, I've been using it to interact with svn.gnome.org instead of using the normal svn client.
Since then I've been living with partial clones of GTK+'s repository, as I could never bear to clone GTK+'s entire history, which goes back to 1997. There have been over 19,000 commits in GTK+ during its lifetime.
Over the weekend I left a screen session in www.gnome.org, doing a full git-svn clone of GTK+'s history. The result is about 460 MB of pure love (about 180 MB for GTK+'s full history (!) and a lot more for the git-svn administration information (!!)).
Now you can have the full history of GTK+ on your machine, in very little space, with branches, tags, and everything!
Instructions
Download gtk+-git-svn-clone-20080304.tar.bz2 (198 MB).
Unpack it somewhere.
Edit gtk+/.git/config and find the lines that say this:
[svn-remote "svn"]
url = http://svn.gnome.org/svn/gtk+
If you only want anonymous, read-only access to SVN, leave it like that. If you have an SVN account, replace the "url" line to be "url = svn+ssh://username@svn.gnome.org/svn/gtk+".
Go to the toplevel gtk+ directory and run these commands:
git-svn fetch # will update all the branches/tags git-svn rebase # will update your master branch with the state of SVN trunk
To get a list of all of GTK+'s branches, do git branch -a.
That's it. I hope this is useful to the people in the GTK+ Hackfest will find it useful to avoid network latencies :)
Using a baby to test accessibility features
What if you lost the use of one of your hands, say, by having to carry a baby while your wife tries to get some sleep?
Today I enabled Sticky Keys in GNOME's accessibility options, and I'm happy to say that it Just Worked(tm). No restarting of anything, it's easy to figure out once you enable it, and it's also easy to disable.
Now I think I want a chorded keyboard.
GtkFileChooser bug week: day 1
The file chooser bug week is rolling along. Today's progress:
Made packages available for testing in the openSUSE Build Service.
Backported single-click shortcuts and skipping over the filename extension for openSUSE 10.3.
Reviewed and committed Garnacho's clever patch for enabling the scrollwheel in the path bar.
Worked on completion in the filename entry a bit more.
If you have outstanding patches for the file chooser (distros, this applies to you as well!), please catch me on irc.gnome.org #gtk+ or irc.freenode.net #opensuse-gnome.
Go backward in time to February 2008.
Federico Mena-Quintero <federico@gnome.org> Mon 2008/Mar/03 18:52:26 CST