Life Online Part 2: The Data

Download ogg Download mp3

In the follow up to our previous shot about life online Jono Bacon and Stuart ‘Aq’ Langridge continue the discussion about how to manage a seemingly endless array of data online from email to micro-blogging to feed readers to documents and more and the many and sundry privacy and convenience issues that are involved. In a world filled with detail, can we strike a balance between safe and simple?

Remember, this is the beginning of the conversation, and we want to hear your thoughts! What are your views on the privacy, freedom and convenience issues surrounding online web services, and how do you feel we get a good balance? Should we bypass the web in search of local fat clients, and what solutions could resolve these issues? Share your thoughts in the shot comments below…

40 Comments to “Life Online Part 2: The Data”

  1. danielsbrewer 26 January 2010 at 12:10 pm #

    Google seems pretty commited to letting you import and export all your data – see “The Data Liberation Front” http://www.dataliberation.org/. So if you can cope with the privacy issues, I think google is a good choice. I do wish there was a one click button or script that would let you backup all your stuff though, shame backupify has the potential of being expensive with the S3 storage.

    • danielsbrewer 26 January 2010 at 12:36 pm #

      Ooops, just noticed backupify is free at the moment … bon! Just need to get local backups and then we have reached nirvana.

      • sorin7486 26 January 2010 at 12:43 pm #

        does backupify also restore ? cuz if it doesn’t we still have a way to go to reach nirvana… I mean I’d like to see something like moving all of my stuff from one service to another in something like half an hour… most of that time being spent waiting for the data transfer.

      • sil 26 January 2010 at 12:49 pm #

        yep, until the end of the month :)

  2. sorin7486 26 January 2010 at 12:35 pm #

    It would be really nice to have a sort of NAS drive that could back up everything from all of your accounts. But in order to do that we do need more open standards for data.

    I’ve never been that confident in online storage and I don’t use it for anything that I feel is critical. I always keep backups at home of anything important.

    And it’s also a matter of getting locked in. I don’t want to stick with gmail for the rest of my life for example. I know what it took to move over from yahoo and I know lots of people that still haven’t done the switch just because it’s so hard.

  3. sorin7486 26 January 2010 at 12:39 pm #

    BTW how about making a segment about loosing ones data ? … it seems to me that computers allows us to remember everything which isn’t really normal. I find it very healthy to delete big parts of that old stuff from time to time… what do you guys think ?

  4. Nasarius 26 January 2010 at 1:04 pm #

    I’ve recently been looking for a good, simple GTD system and this sort of issue has been driving me crazy.

    RememberTheMilk is really good at giving you your data in all sorts of formats, but the basic system is kinda limited and the interface is terrible.

    I love TomBoy; it has a ton of fantastic features that are really easy to use. But syncing is frankly broken. Just trying to get a simple set of notes synced with Ubuntu One to a couple computers resulted in indecipherable errors and half my data being lost. Even if it did work, I’d have to manually invoke sync on a regular basis, which sucks. And even then I still can’t get at it on my Android phone.

    So for the moment I’m pretty much stuck with the Hipster PDA; that is, a bunch of index cards clipped together. It’s simple, accessible everywhere, and highly flexible. I can’t back it up, but I’m not the sort who loses things. I suppose the only real downside is that there’s no search feature.

    Any other recommendations would be very welcome.

    • jono 27 January 2010 at 1:48 am #

      On a task level, I am digging Getting Things GNOME – see http://gtg.fritalk.com/

      • mg 27 January 2010 at 4:13 am #

        GTG looks quite interesting. I’ve never had any use for a note taker (like Tomboy), but GTG looks something I could actually use. I suppose you would (at some point) sync that with Ubuntu One as well if you include it in Ubuntu in future?

        • sil 27 January 2010 at 10:51 am #

          Ryan Paul has already written a desktopcouch back end for GTG; applications that use desktopcouch for storage are automatically synced with Ubuntu One. Just install and use Ryan’s backend.

  5. marxjohnson 26 January 2010 at 1:18 pm #

    I agree with Jono’s points on Ubuntu One. Having your data existing on your local machine (or all of them) AND on the server, all automatically synced is pretty much the ideal solution.

    I probably come under the banner of one of Aq’s “survivalist nutters” in that I prefer to have my own servers than relying on a third party to provide the service (especially if I’m not paying them anything). Is there a tool like Ubuntu One that lets you relatively easily set up your own server? By “like Ubuntu One” the key is that your files sync across the client devices, they’re not just accessible as shared files.

    • Hamish 26 January 2010 at 6:24 pm #

      You can get the couchdb part of ubuntu one on your own server – This part of the review mentions that “It’s worth noting that users who don’t want to rely on Ubuntu One for note syncing can put the URL of their own Snowy server in the preference dialog.” (and the same for evolution contacts).

      http://arstechnica.com/open-source/reviews/2009/11/good-karma-ars-reviews-ubuntu-910.ars/5

      And there are plenty of solutions for the file part – a regular rsync, or I’m sure there’d be some inotify based scripts out there …

    • Rodney Dawes 26 January 2010 at 7:55 pm #

      I tend to disagree, even as an Ubuntu One developer. Synchronization is almost always the problem, and only opens up a new set of problems to deal with. If your synchronizing data, then it’s a de facto guarantee that at some point in time, your data is going to be out of sync. It also means that there can be conflicts, and if you have a lot of places where that data is replicated to, then you now have to deal with resolving those conflicts in every location.

      What is really ideal here, is that your data is accessible wherever you are. Always. Without fail. If you are always with a connection to the Internet, you will never need “offline” support. The obvious, simple way to work around the fact that we don’t currently have always available connectivity, is to synchronize things. Unless every device in the loop of synchronization is exactly the same, there are going to be problems. And nobody has ever gotten this right. Not Palm, not Microsoft, not Apple, nobody. There are things we can do to make the experience better, and there are things we are working on to do that. But it is always going to be suboptimal as long as we are synchronizing data.

      (Disclaimer: This is not pessimistic. Nor is it necessarily the opinion of anyone else on the Ubuntu One team, or Canonical. It is a highly refined personal opinion based on experience of 10+ years working in the field.)

  6. Deepgeek 26 January 2010 at 1:34 pm #

    OK, So I do the “Talk Geek To Me” podcast. I produce a script for every show, and I have a webpage for the show, plus a variety of audio files for my listener’s convenience.

    I think it is important not to ignore “tried and true” technology as you do things online. Most people use blogging software, but I find that if I write good, solid, compliant HTML and keep a copy of my site on my own system, I already have a backup, and don’t have to worry about running some remote script or something like that. I also have a side benefit of having a copy of my work on my laptop, so I don’t have to worry about getting online someplace remote when I want to work on my stuff.

    I do the same for “cloud bookmards.” I do a regular webpage with the utility “Listgarden,” and keep my own flat-file copy on my own system that way.

    Sometimes less “trendy” ways have solid solutions that outweigh “keeping up” with “new new new” tech!

    DG

  7. Rodney Dawes 26 January 2010 at 7:27 pm #

    Obviously plug here: http://launchpad.net/central-services

    The best solution is to not force your users over to some new interface on a new web site, but rather to integrate your service with the applications their already using, to make their experience and transition as easy and simple as possible. The advent of forums, is what really pushed people to use the web for such things. Twitter, Facebook, Myspace, etc… are all just extensions of the classic web forum.

    • sil 26 January 2010 at 7:36 pm #

      …which was in itself an obvious extension of usenet, no? Part of the issue is that people find it easier to develop web services, since you get cross-platformness for free, among other things.

      • Rodney Dawes 26 January 2010 at 8:02 pm #

        Except they’re not really developing web services. They’re developing web sites which provide a facade for a potential service. None of them really care about the user as it were. What they care about is that you use their service to store your data. Millions of unique hits per day, plus lots of advertising, means they get lots of money. A lot of the ones that do provide some API for doing stuff, are only designed in such a way so that you can only write web apps to access your data. Which means you still need a web server and all that jazz to deal with it. The OAuth/OpenID mess has only furthered that failure point. I want to see REST APIs that make sense for everyone, not just web developers. I want my data accessible in my applications.

        • sil 26 January 2010 at 8:50 pm #

          Hence the design of desktopcouch. :)

  8. ghsqa 26 January 2010 at 8:27 pm #

    I’ve been working on a little app to solve the problem of closed online storage and cloud services.

    Have your own cloud service, that way you manage your own cloud. Have an open cloud service which 3rd parties can integrate into your cloud.

    It’s little more than a proof of concept at the moment, but hopefully one day it will be something worth using and useful to people.

  9. stuartfinlay 26 January 2010 at 11:15 pm #

    I did an excercise a year ago to decide what services I would use and where to host my data. This turned nto a fairly comprhensive spreadsheet covering things from password management, email, rss, calendar, contacts, mobile sync to web access, im, photo and file storage, cost, privacy, security, open sourceness etc. I ended up comparing a whole rnage of open source Groupware suites like Horde, Kolab etc against Google, Windows Live, Zoho, Zimbra, ClarkConnect and even a DIY VPS that I would access with mutt, irssi over ssh.

    In the end it came down to the usual big players, MS, Google and surprisingly the CLI-only VPS. Zoho would be where I would look if I started a small business but Google won out at the end of the day with its SSO, consumer features and constantly evolving product suite so I use there services wherever possible and it’s been amazing to not have to worry about syncing and moving information. It’s been brilliant for me and I realised that I don’t need to keep using those old services just because other people I know are still on them. I’ve got to dump MSN, Skype, JungleDisk/Amazon S3, Xmarks and TomTom. I can’t recommend this excercise enough if you want to streamline youronline life. Something for scratchpad even? Backupify will hopefully handle my backups once they fix all the bugs which are preventing it from working atm.

    Looks like I hate privacy and fredoom. YMMV.

    • jono 27 January 2010 at 1:52 am #

      Would be awesome to see your full spreadsheet. :-)

  10. beerdoodle 27 January 2010 at 12:04 am #

    I think it’s reasonable to make sure that you don’t put anything in the cloud that you wouldn’t mind losing. Definitely don’t keep the only copy of something in the cloud. If my Google account got wiped I wouldn’t be at a total loss. I wouldn’t be happy, but at least all of the data I really care about that is important to me wouldn’t be gone forever.

    I keep hard backups at home on an external hard drive (no need for a server yet). I will also burn dvds periodically of things like family photos as well, hard drives won’t last forever. If they go out, you’re screwed, but if you have them backed up on a dvd you can save them on another hard drive later. I will admit that this is becoming less and less an option since dvds will hold only 4.7 gigs and storage needs well exceed this. Blu-Ray disks will be a better option when I get a burner. Dual layer Blu-Ray disks can hold 100 gigs, that’s more like it.

    I also believe in off site backup. I don’t do this yet, but it’s something I am more and more concerned with as my digital data collection continues to rise.

    • jono 27 January 2010 at 1:51 am #

      The issue here though is that you may have duplicates of data (such as email in an IMAP server as well as a local mirror of it on a machine), but people worry about others snooping on their data.

    • hessiess 27 January 2010 at 8:19 pm #

      I defiantly agree with this, storing anything in only one location, be it a cloud service or a local computer is a very bad idea.

  11. mg 27 January 2010 at 4:32 am #

    How well do systems like this work for applications that don’t involve just backing up personal data? I am thinking of things like recording data from water systems, environmental monitoring systems, backing up and restoring configurations from data acquisition systems, etc.?

    That is traditionally a difficult problem because these systems are distributed over a large area, the communications channels are sometimes rather shaky (especially in bad weather), and a lot of the people installing them quite frankly aren’t really that adept at wide area networking. The volumes of data however aren’t typically that great. The difficulty is in synchronising it (from the field to the database). In fact, it is quite normal to only connect intermittently in order to save on data charges.

    In reverse, you have service people who replace hardware, but then the correct current configurations need to be downloaded from central storage to the field.

    There are a number of projects that I’ve worked on in the past where I would have loved to have something like this as an off the shelf solution. Is that feasible, or is that the wrong approach entirely?

  12. funcrunch 27 January 2010 at 8:29 am #

    Personally, what’s even more important to me than accessibility is reliability and accountability. For those reasons, I refuse to use a free service for e-mail, web hosting, or photo hosting. I do have a Gmail account (had to get one when I got an Android phone) and I use Google for my calendar, contacts, analytics, and a number of other things, but I pay nearly $30 a month for e-mail and web hosting and nearly $150/year for pro photo hosting. Even before I went pro with my photo business, I never considered using Picasa; Flickr did what I needed, and I never saw any Google integration advantage to using Picasa.

    I deal with the e-mail access issue by leaving messages on the server until I’m quite sure I’m ready to delete them. When I had a day job this wouldn’t be until I had a copy on both work and home computers. Now that I work from home, and don’t travel much, I try to delete mail from the server on a weekly basis. Yes that means I can’t instantly access last month’s mail from the local coffee shop – oh well!

  13. beerdoodle 27 January 2010 at 2:17 pm #

    True story:

    A friend of mine was playing an online game this weekend in a game server with like 15 of his friends. They ended up playing with an admin who booted their whole team because he didn’t like one of their usernames: 2gaymeninaboat. Well, they looked up the admins profile on the gaming site, then they looked at his facebook page, twitter account, flicker account and any other social media they could find on the guy. Then they started playing with his mind. There were pictures of the guy on some fishing trip that they asked him about and that kind of freaked him out that they knew about it. They got him to let all of them back in the server. He was a little freaked out that strangers could find out all kinds of personal information about him just by doing some simple basic searching.

    I think there is some credible caution that people should take. I try not to put too much personal information online and I always set privacy settings fairly strictly, especially on facebook. There are avenues that you can use to make your life public, but I think people should be very thoughtful about which ones they put sensitive personal information on. I also think that people should be more selective of how much of their lives they live online. Probably at LEAST half of my friends on facebook I don’t ever need to hang out with because they tell everything about themselves online. There is nothing to get to know, no real reason to spend time with them.

  14. beerdoodle 27 January 2010 at 2:19 pm #

    Is the “archive” button in gmail really a good thing?

    • draxil 27 January 2010 at 4:06 pm #

      Definitely! Empty inbox is fantastic, thunderbird 3 has a nice version of the same thing.

    • Gerv 29 January 2010 at 2:37 pm #

      Of course. Inbox Zero FTW.

      Gerv

  15. Shane Fagan 27 January 2010 at 7:13 pm #

    Well the data protection issue has come up recently with the yahoo switch in lucid and I think its very important for one company not to have too much info. If you use google everything how safe is your info? Im a big fan of using a desktop system its like having your own info castle :) You have the walls around the info you can hide info and if your not silly and make sure to keep the door closed and the locks secure you can feel safe. Something about backing up info online doesnt seem too safe. Google have been hacked recently really any big company can be hacked and I just prefer to keep my info in the relative ananimity of my desktop.

  16. hessiess 27 January 2010 at 8:33 pm #

    Personally I do not trust web services, especially free ones. Currently they are leaning to far towards vendor lock-in and do not provide and simple way to dump out or import data. If you are paying for a service, the company in question has more to loose than a free service, especially if the company is small. So I feel that they are less lickly to make a mistake which would cause loss of data or a breach of security.

    The way I proffer to work results in the creation of a huge number of relativity small files, because of this version control is absolutely essential, which is not provided by the majority of file sync services. There really is is nothing better than TeX for writing technical documents and no on-line `word processors’ come any where near the power and efficiency of a decent editor like Vim or Emacs.

    And yes, I do run my own server :)

  17. Michael "Crazy" Howell 28 January 2010 at 2:41 am #

    I’m a crazy nutter who is in the process of migrating to all local stuff.

  18. Stefano Zacchiroli 29 January 2010 at 6:04 pm #

    So, I’ve been having an interesting brief exchange with Aq on twitter. My starting tweet was as follows:

    open data is seriously hindered by closed source web services (hello #ubuntu one),because you can’t trust the API implementation #shotofjaq

    I’ll briefly expand here on that (yes Aq, you managed to convince me that it is worth to post something larger on the subject :-) )

    My main point is that advertising open data as the solution to privacy issues is no solution, or at least is currently being overrated. Those among us which are free software zealots have several reasons to be such, but an important one among then is trust.

    I don’t need to trust blindly someone giving me a binary program and telling me “just use this API which will work that way”. I can look at the code and ensure that it is doing what the API promises.

    Open data (at least as presented in this issue of Shot of Jaq) stops at the API level, I need to trust the other end side of the API “contract” to do what the API promises (e.g. deleting the information I asked to delete). As long as the code of the web service implementing the API is not open (and that’s why I’ve mention Ubuntu One, which currently isnot) I must stop there.

    Then I’ll agree that having a, say, AGPL-licensed web service won’t be necessary enough to complete the trust path (the web service can for instance just claim to be comformant to some source code whereas in practice being not), but it will surely be an important extra step in the right (for me, YMMV) direction.

    Thanks for the show, it is high-quality radio (even if I don’t always share the conveyed messages).

    • sil 29 January 2010 at 7:44 pm #

      I see, and understand, your viewpoint. What I don’t get is — does this mean that you essentially don’t use the web for anything? The distinction between “web sites” and “web services” is pretty artificial, and even if you allow the distinction, search engines are pretty clearly a service. How far do you take the requirement that you need to see the underlying code before you’ll use a web browser to view anything?

      • Stefano Zacchiroli 30 January 2010 at 2:52 pm #

        Oh no, I do use some web services, pretty much as I’m forced to use some non-free app (e.g. a binary flash player) on my otherwise free gnu/linux box. That notwithstanding, I’m not particularly happy about that and I prefer free alternatives where available (e.g. identi.ca which would be perfect even without twitter, if only twitter have implemented open microblogging …).

        That’s why I don’t particularly welcome FLOSS community based distro pushing for closed web services. They should be well aware of the “culture of openness” and they should hence offer open web services to their communities.

        Bottom line: nothing wrong with using non-open web services when free alternatives are not available, but we should keep clear in our minds that that is not the ideal world wrt “standard” FLOSS ideals.

  19. tola 30 January 2010 at 12:25 pm #

    Very interesting topic!

    I’ve been running my own server from home for years now (details below). However, over time I’ve found that administering all this stuff yourself is just far too much hassle! Web applications in particular are very hard to keep up-to-date because they don’t tend to work very well with traditional package managers.

    These days I use GMail, Google Calendar, Google Docs, Google Reader, Rememberthemilk, Facebook, Twitter, Delicious, LastFM… the list goes on! After working for Google for a short time I realised that the backup and data redundancy they have in place is so much better than any backup system I could implement myself, it just isn’t worth the effort on my part! Also, web applications you run yourself don’t currently have the social/network effect of cloud-based services. I wrote down some opinions on cloud computing on my blog http://tola.me.uk/blog/2009/03/05/cloud_computing

    Having said all of that, there are some things that cloud-based services don’t currently do very well and there are certain types of information I’d rather have physical control over. That’s why I’m trying to start the Webian Home Server project – to make running a home server much easier, and to take advantage of emerging web standards for rich multimedia in the browser http://webian.org

    I’m a big advocate of open standards for data, even more than open source for software development. The ideal situation for me would be that web services would use open standards instead of proprietary APIs so that you could freely move your data between providers or host it yourself. Unfortunately standards by their nature only emerge when technologies become… standard, so restricting yourself to standards-based services means missing out on a shed load of innovation and cool features. For that reason I think there will always be compromise between features and openness.

    For those who might be interested, I’ve previously used my home server as: * Document server (filesystem mounted over sshfs or WebDAV from Mac/Linux) * Media server (Ampache and Samba server with XBMC as a client) * Calendar server (mod_dav and phpicalendar with Sunbird/iCal as a client) * Wiki (PHPWiki/MediaWIki) * Source code repository (Subversion + WebDAV + trac) * News reader (Gregarius)

    I’ve never found a mail and contacts server that I was happy with, but I’ve played with software like Horde and Hula/Bongo.


Leave a Reply