Talk:Data.govt.nz

From Open NZ Wiki

Jump to: navigation, search

Use this page to discuss the Open New Zealand moot requirements document for data.govt.nz. Please sign your comments. This page should end up as context for the finished document.

Contents

[edit] Fields

Not sure how to handle annual datasets, e.g. the census. The census is conducted every five years, but the historical datasets are useful. Do we treat different censuses as separate datasets? Do we define a hierarchy, where there's "the census" and under it are the different years? I'm wary of getting into an ontological nightmare of attempting to figure out The Universal System for datasets; I smell a rathole. I guess we answer this by looking at whether the user cares. Probably not--full text search is all they need to find what they're after. Thoughts? --Ntorkington 00:17, 4 August 2009 (UTC)

[edit] Crawling

I'm not sure we want to go the route of requiring that data.govt.nz include a crawler that pulls out microformats: to get the microformats in there, you'd need to adapt a zillion CMSes around Government. If you were going to do that, you might as well have them send the info into the database. As far as I know, no Government datasets are encoded with microformats. It seems like a buzzword-compliance move, rather than something that solves a specific problem. --Ntorkington 02:38, 4 August 2009 (UTC)


I agree that there might be challenges to adoption of microformats for dataset description. Maybe it could be an optional thing. In defence of the idea, some SSC e-govt folks (both existing and past) are very bullish about the idea, based on their experiences with the NZGO portal and the NZGLS. I've also talked to one CRI who is very enthusiastic about using RDFa to describe their datasets online. --Julian Carver 03:09, 4 August 2009 (UTC)


The model of build a repository and get people to populate it, was tried for the NZ government portal, which was a services link repository.

Portal v1: The lessons learnt were: agencies dont have any incentive to enter data a 2nd time. They often do a minimal job of tagging/describing, often not from the customer view point. Agencies bear the cost of double entry. Scaleable. Over time, the quality degrades, as people move on, and new people forget why they're doing the job.

Portal v2: The next thing tried, was to centralise the data entry, to create a standard. But then agencies had no incentive to talk to the data entry people. Expensive, a single central agency bears the cost of double entry. Not scaleable.

Portal v3: The NZ govt portal now uses a public search index built from the original agency web pages, overlaying it with some Vivisimo smarts about clustering. Cheaper and fully automated. Scaleable.

When microformats started gaining popularity in 2006, I started promoting the following concept:

  • To what degree are your website pages usable by people or machines?
  • Appropriately marked up HTML pages, designed for people first and machines second, make information more accessible.
  • Government web pages are not brochure ware, they are well-assembled data structures for information exchange.

Matthew Ross's work about a semantic web publishing standard, builds on that concept. Potentially agencies using a good CMS can publish a web page, with sufficient meta data to allow anyone to build a "virtual" data repository on the fly.

So a machine-readable way to "discover" data sets is about the only option we haven't tried yet.

--Mike.pearson.nz 23:56, 4 August 2009 (UTC)


Thanks for making me think about this more, Mike. I agree: my architecture shouldn't be specified as a requirement--GTS and SSC will know best what works in their organisation. The important thing that we specify is that it works: that we can comment on datasets, that we can contribute if they lag, and so on. I'll remove anything that would specifically limit data.govt.nz to being manually fed. --Gnat 17:48, 18 August 2009 (UTC)

[edit] FONZ/SONZ

Given there are already accepted NZ & internationally standard metadata profiles (ANZLIC/ISO/GCMD/etc), I don't think we should be reinventing this particular wheel by defining a new list of fields. There are also standard metadata repositories/applications in use in NZ, such as GeoNetwork (FAO instigated, Open Source) which is likely to become more widely used in the near future.

Should we not be discussing here which profiles & which applications supporting these profiles are potentially suitable for use, rather than starting from scratch? --Pcreso

I don't feel comfortable mandating the use of a particular metadata standard or piece of cataloguing software. As you say, there are many standards, and I for one lack the expertise to choose meaningfully between them. The choice seems like one that depends on more factors than we have access to--other systems the catalogue must interface with, movement to/from standards within Government, etc. The list of fields in the requirements is a list of the minimum set of information will make it possible for consumers (us) of the data catalogue to crowdsource, search, etc. How that metadata are stored (vocabulary, etc.), and which other metadata are gathered, is left to the government to sort out. This Principle seemed to say it all: Open standards based. It does use international standards for metadata (ANZLIC/ISO) to facilitate interoperability and data sharing. It does not use proprietary formats and protocols which restrict access & interoperability. Perhaps that principle needs to be strengthened, or another Compatibility section added, to point out the existing NZ projects that data.govt.nz would need to be compatible with. What do you think? --Ntorkington 19:37, 4 August 2009 (UTC)


[edit] Formats

[edit] Dataset formats

CSV/TXT, XML?

[edit] Dataset metarecord formats

CSV/TXT, XML, ATOM?

Personal tools