The implementation requires the user to supply a value for Eps. The process is as follows:
- read all points into an arraylist of type point (where point is a user defined class)
- populate a distance array – containing the distance between each point and every other point
- create array for k-dist where k = 4– i.e. for each point what is the distance to the 4th nearest neighbour?
- flag core points – i.e. where 4-dist is <= Eps
- flag border points – i.e. where the point is within Eps of a core point
- by default the remaining points are noise
- populate an array of linked core points – i.e. core points within Eps of each other
- where core points are linked put them in the same cluster
- add border points to same cluster as their nearest core point
The process was run using data from the Heatons North ward using values for Eps of 10m and 20m. The following images focus on a particular section of road:
With a Eps of 10m:With an Eps of 20m:
The core points are black, border points are yellow and noise points are crosses. The number indicates the cluster ID (the 10m Eps returns 14 clusters, the 20m Eps returns 19 clusters)
With the 10m Eps there is more noise and the cluster is more focussed - at 2 junctions instead of 3. The choice of Eps is critical then.
A large view of the 10m results can be seen here.
Summary
My implementation of DBSCAN will not be the most efficient but that doesn't matter. Its the implementation of WPS that I'm interested in.
The next stage is to implement the algorithm as a WPS. The first step will be to look at the input parameters of the WPS i.e. how can I make the accident data in WFS form available to a web service?
No comments:
Post a Comment