Voor en na redactie

Natuurlijk zie je als auteur graag het resultaat van de redacteur. Onderstaand een aantal links naar verschillende voorbeelden van mijn werk. De mate van redactie / herschrijven gaat in overleg met de auteur. Links de originele tekst zoals aangeleverd, rechts het resultaat na redactie.

Met toestemming van de verschillende auteurs geplaatst!
  • een acquisitiebriefje
  • een blog over bronnen van inspiratie
  • een artikel: Ontwikkelen van professionele ruimte in de dagelijkse praktijk
  • een artikel: De keuze voor een vrije wil
  • een wetenschappelijk artikel: The impact of crowdsourcing on spatial data quality indicators
    zie onderstaand

Voor redactie

Quality indicators for crowdsourced geospatial information

Concept artikel voor de Proceedings of the 6th GIScience International Conference on Geographic Information Science

Auteurs: M.van Exel(1), E.Dias (1,2) en S.Fruijtier (1)

1: Geodan S&R, President Kennedylaan 1, 1079MB Amsterdam, the Netherlands
2: Vrije Universiteit – FEWEB/RE, de Boelelaan 1105, 1081HV Amsterdam, the Netherlands





















1. Introduction
It is irrefutable crowdsourced geospatial information has soared the last couple of years. For example the statistics of OpenStreetMap (OSM) show an accelerating growth (OSM 2010). At the same time devices equipped with GPS became mainstream with the introduction of the latest generation of smartphones, diminishing the threshold to participate in crowdsourced geospatial information projects.
Together with the growth in volume, the usage of crowdsourced geospatial informa-tion grew extensively as well. For example OSM maps are used in different commercial projects as background maps. This increased usage makes it important to identify quality indicators for crowdsourced geospatial information (Haklay and Weber 2008, Goodchild 2007) in order to:
1. compare and integrate crowdsourced geospatial information with official, authorita-tive data (from e.g. National Mapping and Cadastral Agencies) and commercial data (e.g. TeleAtlas and NAVTEQ);
2. determine if the crowdsourced geospatial information is fit for the intended purpose
3. predict the quality developments for certain areas (by extrapolating the quality trends from a well described region to a starting region).
According to ISO19113(2002), quality is the “totality of characteristics of a product that bear on its ability to satisfy stated and implied needs”. Quality only has a meaning if we have a common understanding of the definitions of these characteristics. In this abstract we propose a model to incorporate the quality characteristics of crowdsourced geospatial information.

2. Quality elements

2.1 Quality elements
Different definitions for the spatial quality elements exist. Van Oort (2006) compiled five important sources of them and identified eleven elements of spatial data quality: Lineage, Positional accuracy, Attribute accuracy, Logical consistency, Completeness, Semantic accuracy, Usage/purpose/constraints, Temporal quality, Variation in quality, Meta-quality, and Resolution. These elements are used to describe the quality of geo-data collected and produced with a commissioned effort, which entails a specified and uniform method to gather and process the data. Therefore, the quality of such data can be assumed homogenous and consistent.
On the other hand, volunteered geographic collections are characterised by hetero-geneous and diverse quality, due to the fact that it is collected using different methods (e.g. GPS tracks, image tracing) and by different individuals with different motiva-tions, preferences and activity location. Moreover, contributors and contributions are not distributed evenly over space.
In order to describe the quality of crowdsourced geospatial information we intro-duce the concept of Crowd Quality (CQ). CQ attempts to quantify crowd wisdom, or the collective intelligence that goes into the crowdsourced generation of geospatial information, in a spatio-temporal context so as to make it applicable to geospatial data. CQ is suggested to be made operational by using a two-dimensional approach: User-related quality aspects and Feature-related quality aspects. These aspects can comprise the existing quality elements, extended with quality elements specific for crowd-sourced data.
The User dimension manifests the quality of information contributions from an in-dividual contributor's perspective. This is a quintessential characteristic of crowd-sourced data; unlike institutional geospatial information collection, the individuals contributing to the product have no a priori established status or qualification, nor does the context of crowdsourced geospatial information products usually establish techni-cal or organisational constructs that limit or scale the individual's scope for contribut-ing.
The Feature dimension approaches Crowd Quality from the perspective of the spa-tial feature. Rather than looking solely at the established dimensions of spatial quality (e.g. accuracy and completeness) any quality indicator for user-generated spatial fea-tures should encompass the collective experience, knowledge and effort of the indi-viduals who contributed to that feature. In this abstract, we set out to establish operational indicators for both the User and the Feature dimension of Crowd Quality.

2.1 User Quality
Individuals can have any number of personal motivations for contributing to crowd-sourced geospatial information projects. Ongoing research (Nedovic-Budic 2010) suggests that the strongest motives tend to be non-monetary; contributors are driven by the goals and values of the crowdsourced geospatial information project, by altruism, by community building and expectations and learning, by development, self-expression and employment, but also by personal place-based needs and local knowledge . That local knowledge lowers their barrier to start contributing to a crowdsourced project as it enables them to identify missing or incorrect information with relative ease. When the individual decides to continue contributing to the project, he continues to benefit from this local knowledge in much the same way, making contributions with more confidence where he demonstrates a high degree of local knowledge.
The following hypotheses, when successfully tested, help establish an operationali-sation of the User's local knowledge: Familiarity to an area can be correlated to the spatio-temporal pattern of his contribution and he quality of the user’s contribution is higher for the areas he is most familiar with.
A second component of the User dimension is his experience in contributing to the project. We hypothesise that the quality of a user's contribution is correlated with his overall experience in contributing to the project.
Experience may be quantified using the amount of time the user has been registered with the project, the amount of GPS traces he registered with the project, the number of features added or edited, but also his activity in virtual as well as real-life forums established within the context of the project.
In participating in these forums, the User gains not only experience by learning from and generally interacting with his peers, but through this interaction both in real life and in the virtual forums, he also gains recognition, which comprises the third and last proposed component of the User dimension of Crowd Quality.
Recognition in online social networks and online contexts that allow for user con-tributions is often established by tokens. These can be awarded by other users as rec-ognition for a specific contribution, or awarded by the system when a certain threshold of activity, which may be quantitative or qualitative, is met. This type of User recogni-tion is largely unknown in crowdsourced geospatial data. This puts a strain on our abil-ity to assess Crowd Quality, as the peer reviewing of contributions lies at the core of internal quality assurance of crowdsourced information repositories. We need there-fore devise implicit peer reviewing indicators. These can be derived from subsequent contributions by individual users. It cannot be established a priori and in an isolated case whether the lack of subsequent modifications for a spatial feature created by one User is to be considered positive recognition of that specific contribution. If we con-sider a larger spatial context and find one feature that has received few improvements, whereas most features within the spatial extent under consideration have undergone many revisions, that lack of subsequent edits may be considered positive recognition. Further study into the geo-social dynamics of crowdsourced geospatial information projects will reveal additional implicit peer reviewing patterns.

2.2 Feature Quality
Traditional quality assurance and assessment for geospatial information departs from the spatial features that comprise the information entity. That individual spatial feature in a crowdsourced context also reveals a number of quality indicators that pertain to the concept of Crowd Quality that was introduced earlier. In crowdsourced datasets quality elements can be different for similar features, while in traditional datasets the quality elements will be uniform. For example a commissioned dataset about restau-rants will include a defined and specified set of attributes, while a user contributing to a crowdsourced dataset can define his own attributes (e.g. opening times, type of food, cosiness).
Positional accuracy and precision demonstrates a degree of overlap between the tra-ditional, conformance-based spatial data quality assessment. In a crowdsourced con-text, the lineage of the feature is of particular interest. Crowdsourced data is typically contributed to using an array of different methodologies and tools. Some data is brought in through import from other sources that have been demonstrated to be avail-able under a compatible license. The features brought into the crowdsourcing context through such an import have a very clear lineage with regard to positional accuracy and precision. Other common tools used for contributing to crowdsourced geospatial information is derivation from GPS points collected in the field. Here the positional accuracy is harder to establish, mostly because the accuracy and precision metadata is usually stripped from the GPS data, and Users may not attribute their contributions to a GPS source. Tracing over satellite or aerial imagery is yet another common method used, and there are numerous others. In a crowdsourced context, any single feature may have been affected by several different methods. Moreover, the spatial accuracy and precision of neighbouring features may have affected the positioning of the feature under consideration. Further study is required to reveal the dynamics that make up this component.
Semantic accuracy is another component of the Feature dimension of Crowd Qual-ity, pertaining to the completeness and internal consistency of the attribute metadata. Considerations and challenges for operationalising this component are many. First and foremost, a predefined schema for attribute metadata is not common in crowdsourced geospatial data projects. Much trust is put in the selforganising capacity of crowd-sourcing ecosystems. This lack of a priori organisation allows for the creative input of individuals and small groups with specific interests to benefit the project by generating a breadth of information that would not otherwise be feasible, but at the same time poses a threat to internal consistency.
Lastly, the diversity of feature types and feature attribution is hypothesised to have a positive correlation with the quality of the information.

2.3 Hybrid components
We have seen that the different components of the two dimensions of Crowd Quality can often not be considered as independent, disparate entities. The user dimension manifests itself in the contributions that a user makes to the database, and must there-fore be measured through the features. Feature quality on the other hand, is ultimately intertwined with the User Quality of the individuals that contributed to the feature under consideration.
One prominent indicator for Crowd Quality that cannot be grouped in either of the two dimensions is spatio-temporal dynamism and persistence. This indicator deals with such characteristics as:
* How many different users have contributed to a feature?
* How has a feature developed over time?
These characteristics have intertwined User and Feature quality dimensions that cannot be separated from one another.

3. Discussion and further work
This abstract proposes a framework approach to the quality of crowdsourced geospatial information by using crowd dynamics. We introduce different indicators that take into account Spatial Crowd Activity (surrogated by number of edits and number of editors) and Temporal Crowd Activity (number of edits in per time periods). We also propose the concept of Relative Crowd Activity (Number of edits relative to an enclosing / neighbouring area).
Future work will involve operationalising and testing the assumptions made. Opera-tionalisation will be aimed at confronting the User and Feature dimension of Crowd Quality.

References

• Goodchild M.F. 2007. Citizens as Sensors: The World of Volunteered Geography, VGI Specialist Meeting Position Papers, Santa Barbara, CA.
• Haklay, M and Weber, P, 2008, OpenStreetMap: User-Generated Street Map. IEEE Pervasive Computing, 7(4): 12-18.
• ISO 19113:2002, Geographic information - Quality principles.
• Nedovic-Budic, Z and Budhathoki, N.R., 2010. Motives for VGI Participants. Ongoing research presented at the workshop 'VGI for SDI', Wageningen University, NL, April 16th, 2010.
• OSM, 2010, OpenStreetMap: Wiki, http://wiki.openstreetmap.org
• Van Oort, PAJ, 2006, Spatial data quality: from description to application, PhD Thesis, Wageningen University, NL, 132 p.

Na redactie

The impact of crowdsourcing on spatial data quality indicators

September 2010 gepubliceerd in: Proceedings of the 6th GIScience International Conference on Geographic Information Science, 213.

M.van Exel (1), E.Dias (1,2) en S.Fruijtier (1)
Redactie: Ceciel Fruijtier

1: Geodan S&R, President Kennedylaan 1, 1079MB Amsterdam, the Netherlands
2: Vrije Universiteit – FEWEB/RE, de Boelelaan 1105, 1081HV Amsterdam, the Netherlands


ABSTRACT

The increasing usage of crowdsourced geospatial information asks for a quality in-dicator for this type of data. In this paper, we introduce the concept ‘Crowd Quality’ (CQ) to describe and quantify the quality of crowdsourced geospatial information. CQ is based on a two-dimensional approach: User-related quality aspects and Feature-related quality aspects, which are often interdependant. Feature Quality of crowd-sourced geospatial data can be assessed by the same quality elements usually used for features from traditional, conformance-based spatial databases. However, for crowdsourced databases these quality elements are far more complex. We plan to use crowd dynamics to address this complexity.


1. Introduction

Crowdsourced geospatial information has soared the last couple of years. For example the statistics of OpenStreetMap (OSM) show an accelerating growth (OSM 2010). At the same time, devices equipped with GPS became mainstream with the introduction of the latest generation of smartphones, diminishing the threshold to participate in crowdsourced geospatial information projects.
Together with the growth in volume, the usage of crowdsourced geospatial information grew extensively as well. For example OSM maps are used in different commercial projects as background maps. This increased usage makes it important to identify quality indicators for crowdsourced geospatial information (Haklay and Weber 2008, Goodchild 2007) in order to:
1. compare and integrate crowdsourced data with institutional data (from e.g. National Mapping and Cadastral Agencies) and commercial data (e.g. TeleAtlas and NAVTEQ);
2. determine fitness for the intended purpose;
3. predict the quality developments for certain areas.

In this article, we introduce the concept ‘Crowd Quality’ (CQ) to describe and quantify the quality of crowdsourced geospatial information.








2. Quality elements

The term ‘quality’ has a meaning if we have a common understanding of its definition. According to ISO19113(2002), quality is the “totality of characteristics of a product that bear on its ability to satisfy stated and implied needs”. For spatial quality elements, different definitions exist. Van Oort (2006) compiled five important sources of them and identified eleven elements of spatial data quality: Lineage, Positional accuracy, Attribute accuracy, Logical consistency, Completeness, Semantic accuracy, Usage/purpose/constraints, Temporal quality, Variation in quality, Meta-quality, and Resolution.

These elements are used to describe the quality of geo-data collected and produced with a commissioned effort, which entails a specified and uniform method to gather and process the data. Therefore, the quality of such data can be assumed homogenous and consistent.
However, volunteered geographic collections are characterised by heterogeneous and diverse quality, due to the fact that it is collected using different methods (e.g. GPS tracks, image tracing) and by different individuals with different motivations and preferences. Moreover, contributors and contributions are not distributed evenly over space. To adress this problem, we introduce the concept of Crowd Quality (CQ) to describe and quantify the quality of crowdsourced geospatial information.


3. Crowd Quality

Crowd Quality (CQ) attempts to quantify the ‘collective intelligence of the crowd generating data’ in a spatio-temporal context. CQ is based on a two-dimensional approach: User-related quality aspects and Feature-related quality aspects. These aspects can comprise the existing quality elements, extended with quality elements specific for crowdsourced data.

The User dimension manifests the quality of information contributions from an individual contributor's perspective. This is a quintessential characteristic of crowd-sourced data. Unlike institutional geospatial information collection, the individuals contributing to the product have no a priori established status or qualification. Nor is the individual’s scope for contributing determined by organisational constructs.

The Feature dimension approaches Crowd Quality from the perspective of the spatial feature. Rather than looking solely at the established dimensions of spatial quality (e.g. accuracy and completeness), any quality indicator for user-generated spatial fea-tures should encompass the collective experience, knowledge and effort of the indi-viduals who contributed to that feature. We aim to establish operational indicators for both User Quality and Feature Quality.

3.1 User Quality
We suggest three components to determine User Quality: Local knowledge, Experience, and Recognition.

Individuals can have any number of personal motivations for contributing to crowd-sourced geospatial information projects. Ongoing research (Nedovic-Budic 2010) suggests that the strongest motives have either an idealistic or a freetime nature, or are driven by personal place-based needs and local knowledge. This local knowledge en-ables the contributor to identify missing or incorrect information with relative ease. The following hypotheses, when successfully tested, help establish an operationalisation of the User's local knowledge:
• Familiarity to an area can be correlated to the spatio-temporal pattern of his contribution.
• The quality of the user’s contribution is higher for the areas he is most familiar with.

A second component of the User dimension is his experience in contributing to the project. We hypothesise that the quality of a user's contribution is correlated with his overall experience in contributing to the project.
Experience may be quantified using the amount of time the user has been registered with the project, the amount of GPS traces he registered with the project, the number of features added or edited, but also his activity in virtual as well as real-life forums within the context of the project. In participating in these forums, the User gains not only experience by learning from and generally interacting with his peers. He also gains recognition through this interaction both in real life and in virtual forums.

Recognition comprises the third and last proposed component of the User dimen-sion of Crowd Quality. In online social networks and online contexts that allow for user contributions, often tokens are established. These can be awarded by other users as recognition for a specific contribution, or by the system when a certain threshold of activity is met. This threshold may be quantitative or qualitative. This type of User recognition is largely unknown in crowdsourced geospatial data. This puts a strain on our ability to assess Crowd Quality, as the peer reviewing of contributions lies at the core of internal quality assurance of crowdsourced information repositories. Therefore, we need to devise implicit peer reviewing indicators.
Implicit peer reviewing indicators can be derived from subsequent contributions by individual users. If we consider a larger spatial context and find one feature that has received few improvements, whereas most features within the spatial extent under con-sideration have undergone many revisions, that lack of subsequent edits may be con-sidered positive recognition. This recognition cannot be established a priori nor in an isolated case. Further study into the geo-social dynamics of crowdsourced geospatial information projects will reveal additional implicit peer reviewing patterns.










3.2 Feature Quality
Traditional quality assurance and assessment of geospatial information, departs from the spatial features that comprise the information entity. In crowdsourced datasets quality elements can be different for similar features, while in traditional datasets the quality elements will be uniform. For example a commissioned dataset about restaurants will include a defined and specified set of attributes, while a user contributing to a crowdsourced dataset can define his personal attributes (e.g. opening times, type of food, cosiness).
Feature Quality of crowdsourced geospatial data can be assessed by the same quality elements usually used for features from traditional, conformance-based spatial databases. However, for crowdsourced databases these quality elements are far more complex. Of particular interest are Lineage, Positional accuracy and Semantic accuracy.

Crowdsourced data is typically generated using an array of different methodologies and tools. Some data is imported from other sources, if available under a compatible license. These imported features have a very clear lineage with regard to positional accuracy and precision. Another common tool used for contributing to crowdsourced geospatial information, is derivation from GPS points collected in the field. Here the positional accuracy is harder to establish, mostly because the accuracy and precision metadata is usually stripped from the GPS data, and Users may not attribute their contributions to a GPS source. Tracing over satellite or aerial imagery is yet another common method used, and there are numerous others. In a crowdsourced context, any single feature may have been affected by several different methods. Moreover, the spatial accuracy and precision of neighbouring features may have affected the positioning of the feature under consideration. Further study is required to reveal the dynamics that determine positional accuracy and precision of any specific feature.

Another complex quality element is semantic accuracy, pertaining to the completeness and internal consistency of the attribute metadata. A predefined schema for attribute metadata is not common in crowdsourced geospatial data projects. Much trust is put in the selforganising capacity of crowdsourcing ecosystems. This lack of a priori organisation allows for the creative input of individuals and small groups with specific interests to benefit the project by generating a breadth of information that would not otherwise be feasible, but at the same time poses a threat to internal consistency.
Lastly, the diversity of feature types and feature attribution is hypothesised to have a positive correlation with the quality of the information.







3.3 Interdependency of User and Feature Qualities
User Quality and Feature Quality can often not be considered as independent, disparate entities. The user dimension manifests itself in the contributions that a user makes to the database, and must therefore be measured through the features. Feature Quality on the other hand, is ultimately intertwined with the User Quality of the individuals that contributed to the feature under consideration.
Spatio-temporal-dynamism-and-persistence has intertwined User and Feature Quality dimensions that cannot be separated from one another. This indicator deals with such characteristics as:
• How many different users have contributed to a feature?
• How has a feature developed over time?






4. Further work

Future work aims at the operationalisation of the Crowd Quality concept. We propose a framework approach to determine the quality of crowdsourced geospatial information, using crowd dynamics. We will introduce different indicators that take into account Spatial Crowd Activity (surrogated by number of edits and number of editors), Temporal Crowd Activity (number of edits per time periods) and Relative Crowd Activity (number of edits relative to an enclosing / neighbouring area).




References

• Goodchild M.F. 2007. Citizens as Sensors: The World of Volunteered Geography, VGI Specialist Meeting Position Papers, Santa Barbara, CA.
• Haklay, M and Weber, P, 2008, OpenStreetMap: User-Generated Street Map. IEEE Pervasive Computing, 7(4): 12-18.
• ISO 19113:2002, Geographic information - Quality principles.
• Nedovic-Budic, Z and Budhathoki, N.R., 2010. Motives for VGI Participants. Ongoing research presented at the workshop 'VGI for SDI', Wageningen University, NL, April 16th, 2010.
• OSM, 2010, OpenStreetMap: Wiki, http://wiki.openstreetmap.org
• Van Oort, PAJ, 2006, Spatial data quality: from description to application, PhD Thesis, Wageningen University, NL, 132p.