Observation Accuracy Experiment v0.4 is Underway

We delayed Observation Accuracy Experiment v0.4 a week to not distract from City Nature Challenge 2024. We're asking contacted validators to assess the sample by May 13.

We made 4 changes to this experiment from v0.3:

Validators are now matched to sample observations based on past ID behavior in the same country rather than continent
Previously, we matched a validator to a sample observation if they had at least 3 improving IDs within the same continent. We're now requiring at least 3 improving IDs within the same country to try to better match observations and validator experience. If we couldn't find any validators for an observation sample within the country, we expanded our search to continent and then globally, but we did this rarely. The samples are getting a bit large for some validators so we hope its not becoming a burden. Please let us know.
We added a disclaimer to the message to comment here rather than replying to the message
We've been receiving a large number of responses to the messages we send to contact validators that we aren't able to properly read and respond to. We added a disclaimer to the message with a link to this blog post asking people to post their questions and feedback here rather than replying to the message.
We added more details about how to view your sample when viewing the message within the Android app
Some people trying to access the message from the Message section of the iNaturalist Android app have been having trouble navigating to the sample. We included more details explaining how to do this.
We added a search parameter to view observations included in an experiment sample
Now that we're sampling 10,000 observations, clicking bars on a completed Experiment page is limited to exploring the first 500 observations. Longer term, we plan to make improvements to that page. But for now, we added a search parameter to construct explore URLs that, similar to how projects work you can construct Explore URLs using observation_accuracy_experiment_id, so you can see the sample of observations included in an experiment. For example, here are URLs for all the experiments so far (remember the version e.g. v0.4 isn't the same as the id e.g. 5):
- v0.1
- v0.2
- v0.3
- v0.4

Other than these three changes, this design and logistics of this experiment are the same as v0.3. As before, the Experiment page is live but the results will be updating daily and won't be finalized until May 13. At that time, we'll update this post with more discussion from the results.

Thanks again to all validators we contacted participating in this experiment. We wouldn't be able to conduct these monthly audits of iNaturalist observation accuracy with out your help!

Results (added 05/13/2024)

The results of this experiment were very similar to the other experiments. The average Research Grade accuracy (fraction correct) was 95%. You can explore the results including clicking through bar charts to observations here.

From all 4 experiments we've conducted, we've now assessed 22,000 observations including 12,464 Research Grade observations. The graph below uses Research Grade observations from this combined sample to estimate accuracy subset by continent, taxon group, and rarity (<100 observations is rare) and sorted by the uncertainty (95% confidence intervals). We now have much better estimates than we had before this experiment, and for many of the common subsets (black) we now have large enough samples to get confident estimates. But for all of the rare subsets (orange) our sample sizes are still too small to be confident in our estimates. As discussed in this thread below, we will probably have to design an experiment with a non-random sample targeting rare taxa to include enough of them to reduce the uncertainty accuracy estimates for these rare subsets.

Thanks again to everyone who participated in this experiment! We know this was a busy time on the heels of City Nature Challenge and very much appreciate your helping improve these accuracy estimates.

Publicado el 07 de mayo de 2024 a las 06:48 PM por

loarie

Comentarios

Another one? And just after a solid week of identifying for the City Nature Challenge - I dont think I ever want to have to do an ID again!!!!!

Anotado por tonyrebelo hace 12 días

Thanks for adding the - do not reply - disclaimer.

Anotado por dianastuder hace 12 días

Thanks @tonyrebelo - we very much appreciate all the IDing help. We know the timing is not great given City Nature Challenge. Thanks for all you do

Anotado por loarie hace 12 días

@andresvila same here! i just got one little observation of a captive donkey from 6 years ago... i wish i could help more

Anotado por imareallygoodphot... hace 12 días

I identified 13 species of the order Odonata, but I don't know how good a test this is, as the majority of them were among the most easily identified species of dragonflies, mostly from eastern United States. Much better tests would be to dive into the tropics if you want to know what percentage of species are being correctly (or at all) identified. Perhaps you are doing that also.

Anotado por dennispaulson hace 12 días

I wonder why I got 53? Will try tomorrow when our internet is not as burnt out as I am.

I appears that the plea page must be responded to, in order to turn off the red mail flag.

Anotado por arnim hace 12 días

My set this time matched what I identify much more! I'm a little sad I only got to identify two, but still happy I could help!

Anotado por tcriley hace 12 días

@arnim - a message just needs to be viewed for the red notice to go away, it doesn't need to be replied to.

I got my own observation for one of the ID's. Not sure what to do about that, but I did not ID my own observation.

Anotado por coreyjlange hace 12 días

Three of the nine I received were observations I had already made identifications on.
Two of those three were ones on which I made the "Improving" ID to the currently research-grade taxon.
How should I deal with this?

Anotado por amr_mn hace 12 días

I also got a bunch that I had already added IDs to. I assume we don't need to add an ID again?

Anotado por davidenrique hace 12 días

Thats right. If you've already ID'd an obs and you still stand by your ID, please skip. Otherwise. update your ID. Thank you!

Like last time my set matched my IDing behaviour well.. maybe even a bit better now. I did 23 observations and the number is fine for me.. but I also usually ID much more in a day...

To the people wanting to help out more.. click on the "v0.4" under number 4 above and set your filter... I will probaby check out the other spiders as well

Anotado por ajott hace 12 días

A couple native species in the set this time yay!

Anotado por wildskyflower hace 12 días

Great. One suggestion I have is to maybe include the suggestion to NOT say an ID is part of the experiment unless the ID is broader than the community taxon. I keep getting notifications of people adding agreeing IDs because they comment that it's part of the experiment. The completely unnecessary notifications are pretty annoying. It's not a huge deal, but a suggestion in the message might ameliorate the issue.

I had one obs that was an images showing different species case. I did the usual, IDed (disagreeing) as common ancestor, left a comment and added my (the first) 'no' vote to 'Evidence related to a single subject'.

Anotado por richyfourtytwo hace 12 días

Done. Phew! Is there anyone else who finds the identify format incredibly difficult to use?

Anotado por gillydilly hace 12 días

The new strategy for matching observers to observations seemed to be an improvement this time, at least for me.

However, it is less and less clear to me what purpose this experiment serves and what can be learned from it.

What does the "accuracy" being measured really tell us? The likelihood that any randomly selected observation will already have an ID that experienced IDers on iNat agree is correct? Whether a randomly selected observation can still have its ID refined? These seem to me to be fairly trivial.

The inclusion of non-research-grade observations in the last couple of rounds means that we are also looking at observations that do not yet have a community consensus -- i.e., I have several where the only ID is that of the observer, or it is a general ID like "flies" that could probably be refined by a specialist (which I am not), or it is at a high level because there is a wrong ID by an unresponsive observer and not enough people with relevant expertise have looked at it to override that ID.

What are we measuring in such cases? If I ID it as "flies" does that indicate it cannot be refined further? If I add a refining ID or add the additional ID that overrides a disagreement, what does this tell us except that the observation had not yet been reviewed?

I'll look at my set and add IDs to those where I can provide a meaningful contribution, but I honestly don't see any point in adding additional confirming IDs to common local plants or honeybees. There are lots of other ways I can spend my time that would do more to improve iNat's data.

Anotado por spiphany hace 12 días

Thanks @ajott - but just to be clear an obs won't be considered validated unless the IDer meets the at least improving ID for that taxa threshold. But for sure chiming in and sharing expertise on obs will likely help get consensus around their ID even if people don't meet the official validator criteria.

Done!

Anotado por xris hace 12 días

I had 69 observations to ID and went through them, corrected some, and mostly supported others' ID's. Can't wait for the results!
And good thing to put not to comment (I can see hundreds of people, including myself, commenting if it wasn't mentioned).

Anotado por huttonia hace 12 días

How many of these are there going to be?

Anotado por jasonhernandez74 hace 12 días

surprised at the people getting dozens of observations to ID -- I feel left out, having just the 5!

Anotado por jamie-aa hace 12 días

We can leave it without an ID, since it goes to Casual with the new DQA - was in the info from iNat.

(I did the usual, IDed (disagreeing) as common ancestor @richyfourtytwo fourtytwo

I had 125 observations to review. One was of a Surf Scoter that had sat around as 'Needs ID' for 2 years! It's now RG. The other observations were a good mix of waterfowl, lady beetles, and other groups I occasionally ID. I was surprised by how old the observations were. Some were from 10 years ago!

Anotado por that_bug_guy hace 12 días

I got 6 observations which seems to be the average.

One of my observations was Rubus armeniacus, which is a hotly debated taxon! Maybe someone else will say it's R. bifrons and the iNaturalist curators will try to sort it out a bit more. It's really a mess.

Anotado por ccoslor hace 12 días

This time I got quite a few of casual observations (no location, often no date as well, from accounts with very few observations and weren't active recently). Often the first ID was from a few hours ago, so I suppose they started out as unknown before this experiment? But honestly, I'm not sure how much can be gained (in terms of this experiment) from sticking yet another broad-level ID on those.

Anotado por helmwige hace 12 días

I only got 2 observations to ID and I’d be happy to do more. Wish I got a couple of South American Mangora, as they are my favorite, but oh well. Maybe I didn't meet the requirements for those :(

Anotado por iluvspiders hace 12 días

@iluvspiders, maybe next time :)
(I had only very few my first time on one of these experiments.)

@davidenrique agreeing IDs with included comments show up in your notices? My apologies, I didn't realize that would happen! I thought documenting all IDs related to this experiment would be valuable (even though it was more time consuming than just clicking the agree button); without that, it could be confusing to see multiple new IDs being added to observations that had been research grade for years, and those added IDs could be interpreted as trying to run up one's ID totals.

Anotado por larry216 hace 12 días

My problem is all the notices about this thread -- eight within an hour. I'm unsubscribing; I will check back later to see if my question was answered.

Several of the observations I was asked to ID in Observation Accuracy Experiment v0.4 were out of my state. I usually do not attempt things out of state because I do not know as much about what is possible and there are plenty of observations to work on from my state.

Anotado por marykeim hace 12 días

@larry216 Look up some famous observations, where there are dozens or more identifiers, all saying the same thing. Adding to their collection of species maybe? eg.
https://inaturalist.nz/observations/50977836

I have done my duty!

Anotado por diegoalmendras hace 12 días

After all that IDing for CNC, I told myself I would take a break. But then I got my first request to participate in this project. Plan for break summarily ditched.

Anotado por lappelbaum hace 12 días

@huttonia Yeah, maybe next time. Turns out there weren’t any South American Mangora in the set of observations at all, so I got unlucky lol.

@larry216 Yep. I have notifications enabled except for agreeing IDs (with over half a million ID's, I'd drown in notifications if I got them for agreeing IDs too!). I don't want to miss dissenting IDs or comments, which are often important, so I do get notifications for IDs with comments. I totally get the desire to be transparent about the reason for the IDs, but all the notifications do get a little overwhelming lol.

Is it acceptable to ask the observer for more info? I have one to ID that is a plant that seems to be something that does not grow naturally where the observation is sited (even though obscured). I have asked them if it is a cultivated plant (not marked as such), but I don't know if that is appropriate for this experiment.

Anotado por vireyajacquard hace 12 días

@davidenrique I wish there was a way to turn off notifications for Observation Fields. None of the available mechanisms seem to affect them, including muting the user adding them to hundreds, thousands of my observations.

@xris I know this isn't what you want per se, but there IS a way to restrict who adds observation fields to your observations. You can go your account settings> Content & Display > and change the "Who can add observation fields to my observations?" option to "curators" or "only you".
I don't recommend doing this, as observation fields can be useful, but that's an option if it's really affecting you too much.

@vireyajacquard Yes that sounds like a normal part of the ID process for a high confidence ID.

@davidenrique I don't mind them doing it. We've corresponded about their Project, and I want to support it with my Observations. However, I have a lot of Observations that are similar with respect to their selected Observation Fields. I'm aware that they're doing this work. I don't need to be notified about it every single time!

Muting someone should prevent you from being notified when they add an observation field to an observation you're following.

Anotado por tiwane hace 12 días

@tiwane It doesn't when it's my own Observation.

@xris hmm ok, that may be a bug. Can you email help@inaturalist.org with info, if you have a moment?

Done and done! Was able to ID most of them but there was a really tricky Sterna tern that was eluding me, so had to leave it at genus. Thanks for reaching out, will be happy to help (if I can) with any further rounds of testing!

Anotado por vireosylva hace 12 días

@tiwane Sent!

It took me a while to figure out that I had to edit the url from "inaturalist.org" to "inaturalist.ca" before I could even log in. Then the same thing happened again when I clicked on the link in the original message, to get to this blog to see if anyone else is having the same problems. Pretty annoying to have "incorrect password" returned over and over when I KNOW I'm entering it correctly (for .ca, not .org, it turns out).
Greg

Anotado por gpohl hace 12 días

had around the same number of observations to review, but they were more relevant this time. It went faster for me and was also more satisfying.
It was notable that each observation I reviewed already had 2-4 others having added IDs in the last number of hours. I'm thinking that it may help a little to limit the number of people asked to review a given observation... there's pros and cons, but above say 4 qualified people it really does seem.... inefficient.

Anotado por astra_the_dragon hace 12 días

I have been asked to ID an observation which has the location set to Private. However, without that info I can't ID to species level. Should I still attempt to ID? Any other instructions on how to deal with this case?
Thanks!

Anotado por phoekman hace 12 días

What to do with a known undescribed species? It is identified to genus, but will show up only as an agreement at generic level, when most of the identifiers know it to species, albeit a still undescribed one.
I've only seen one case so far (my 48 plus about a dozen that have higher rank IDs posted for the experiment on my and other observations) so about 1 in 80 on biased sample.
We cope with these cases in our community by adding a field (https://www.inaturalist.org/observations?verifiable=any&place_id=any&field:New%20species%20reference%20and%20name=Strychnos%20sp.%20nov.), but the experiment wont know this.
I doubt it will be the only case in Exp 4, but probably irrelevant overall (<1%) - or perhaps not for some areas?
Anyway, as usual a fun exercise: only a few observations that I had to pass from other parts of the world or groups I dont know - most were within my domain.
Thanks for the opportunity to participate.

Would be great to have a 'multiple observations of the same organism' box to select in the Research Grade Qualification list. I often ID unknowns and there are often numerous observations from one account of the same organism. I leave a comment when I come across it to inform them of what to do but a lot of the time it's ignored. It must surely skew the data when 10 observations of the same organism all reach research grade?

Anotado por kmackau hace 12 días

Please rather word it as "duplicate observations of the same organism"

@kmackau make a feature request.
I won the multi-species battle, you tackle this one?
(Meanwhile I have a copypasta for that)

'duplicate observations of the same organism' is much better wording @tonyrebelo :) How do I make a feature request @dianastuder ?

Anotado por kmackau hace 11 días

Only one for me, as I am still not able to ID much. And I believe that it was two photos of different species. Does this mean that I should go through my observations to check for the same? Ha

Anotado por geeseinflight hace 11 días

Great! That's the first experiment when I got observations fitting my ID group perfectly.

Anotado por bagli hace 11 días

Think about it - long, and hard @kmackau . Write the text, and edit.
Then go to the Forum and - add feature request. For another new DQA? That would be the elegant guidelined solution (especially if it also tips to Casual, until resolved by the observer)

PS - new route (mine was in the email to tiwane days) - https://forum.inaturalist.org/t/about-the-feature-requests-category-please-read-before-posting/69

My copypasta for these problem obs

See also (insert other obs numbers here, which is ... tiresome to do, but useful for the next identifier, and an eventual techie / DQA solution)
~
Please combine multiple pictures of the same individual of a species
https://forum.inaturalist.org/t/how-to-turn-multiple-observations-into-a-single-observation/9838

Anotado por dianastuder hace 11 días

Done.

@loarie about 20 or 30 obs would be fine - but 2 pages is one too many.
Perhaps you can tweak to offer a few more across identifiers.

@dianastuder - looks like we shared quite a few observations to check. Not surprizing perhaps, but perhaps one way to rationalize number?
I dont mind the numbers. Would be happy to to say 100, if it was my interest group. One observation in my set (https://www.inaturalist.org/observations/21582820) took as long as all the others combined: a Giraffe in a zoo in Bakersfield - raised hackles: sun high for 19h22, no locality, two observations on at 19h30 is gloaming, no website zoo, farm or petting place shows Giraffe in the area. Looks like Maasai, but I wanted to check against zoo records, but cannot place it. [I have only identified the two subspecies of Giraffa giraffa, but this was only ID'd to a deprecated generic level because of a swap following the creation of 4 species from previously 1 sp and 7 subspecies, so I guess I was included for a California ID; my other way-out one was a sterile Pterocarpus to genus in Mexico, but their species are different from our African ones, so I just Dicotted it).

Anotado por tonyrebelo hace 11 días

@gpohl your same password should work on iNaturalist.ca and iNaturalist.org. If not, please contact help@inaturalist.org.

Anotado por carrieseltzer hace 11 días

I'm confused, I clicked on v.04 and 9k observations came up. Am I doing something wrong and does it matter if I, ID those? I haven't had anything sent to me via email.

Anotado por cs16-levi hace 11 días

I would love to be able to use the algorithm that matches me to observations for this experiment more generally. It would be neat to have a custom page of observations in need of research grade that have been matched to me as a possible IDer, based on my past IDing behavior. I think this would encourage me to ID more. Maybe there are good reasons not to do that, but just thought I'd put the idea out there!

Anotado por teresap hace 11 días

Please remove me from inclusion in any future versions of the accuracy experiment, I think that the experimental design where I am asked to ID plants in areas where I would NEVER id (Mexico, California, Arkansas for example from this round of 108 observations (5 hours)) means that the experiment is deeply flawed, even with me opting for the high level taxonomic groups that eliminate my input- while wasting my time.

Anotado por patswain hace 11 días

Since we have a link to this whole batch, may we filter to our taxon and / or location and see what is there for us to ID? Or rather NOT this batch until after the deadline?

@loarie A few points of clarification about ajott's question:

1.) If I have three improving IDs to a genus in a particular country, and an observation that I had previously ID'd to genus is part of the experiment, would I still count as a validator if I change/refine my ID (even though I was not sent it originally)?
2.) I the answer to 1.) is yes, when is validator eligibility determined/is it fixed? For example, I notice that before this experiment it appears I had exactly 3 leading and 2 improving IDs of genus X in country Y. One of my leading IDs was on an observation selected for the experiment, so now that leading ID has been converted to improving. Would that make me eligible validator for that genus in that country now? If so, if I refine that one ID to agree with the community taxa, would I lose eligibility because my active ID would be supporting?
3.) On another observation I was sent, it seems like there is a chance the original community taxon was either wrong or too confident and might either be backed up to a higher level or actually change to a different taxa. Would a change to the community taxa to one where many of those of us who were sent it would not be eligible as validators cause us who were sent it to lose eligibility as validators, and would others gain it? (this was my main question, the first two questions were just setting it up; it seems important because we almost certainly are more qualified to say what it is not than what it is)

Depending on the answer to question 3, I think it could be an interesting follow-up experiment to re-send just the subset of observations that were scored as 'incorrect' or maybe also 'uncertain' to a larger set of potential validators (perhaps a higher target redundancy and inclusion of a more expansive set of validators from the common ancestor/new taxa), to see if the community taxon changes further. This could be an interesting cross-check of the validity of the validators in disagreements, and might help refine the ambiguity of the 'uncertain' category.. Obviously you would archive a frozen version of the results before re-sending anything, and such a set would not be random.

Anotado por wildskyflower hace 11 días

I know this is not a permanent feature, but I like this experiment. It would be a cool feature to opt into monthly messages with a link to a "random selection" of observations that fit your identification habits. Could be a nice way of getting older observations that for some reason have been overlooked in front of identifiers again.

Anotado por eyekosaeder hace 11 días

@eyekosaeder Why not auto-email yourself once a month with a link? such as:
https://www.inaturalist.org/observations/identify?order_by=random&place_id=7207 (i.e. Needs ID random for Germany [or other area]). Skip those that dont interest you.

@tonyrebelo well, now that's amazing. I didn't know it was possible! Thank you! :D

My subset size jumped to 82 with this one and most seem to be good matches for what and where I usually ID. There's one outlier in a different country but I have in the past indeed added IDs there so that's just icing on the cake. And even more fun: One of my own observations is included in this sample. Awesome to see the IDs pour in! It's RG at genus level (cannot be improved) and so far nobody has dared to take it to species yet (which is good, because I honestly don't think it can be narrowed based on the evidence provided).

Anotado por annkatrinrose hace 11 días

I got only 6 Chironomids. None of them were Id-able to genus let alone species.

Anotado por zoology123 hace 11 días

My subset has 37 observations to ID. Piecemeal compared to what I usually identify. So just keep 'em coming. I have opted out of the CNC treadmill. In central Alberta it is barely spring when the CNC happens. I funnel my resources to more meaningful projects (such as this one) instead of IDing little snippets of greenery or leafless shrubs and trees. The observation subset is now fairly well matched to my identifying skills, which is great. From what I see now, most ID revisions happen to substandard observations that have blurry images, include more than one species, or lack locality data. It is too bad that such observations influence the end results negatively. I am looking forward to the outcome of this experiment.

Anotado por matthias22 hace 11 días

Thanks @dianastuder - sounds terrifying! I'll have a look...

I felt confident enough to provide a species-level ID on 11 of the the requested observations, genus/section-level ID on 4 observations, and I did not have the confidence to provide any ID for one. Thanks for letting me help out!

Anotado por ljfekontanis hace 11 días

I do think some countries such as the united states are so large, that it makes this a bit hard. I was getting observations from Washington State based on things I IDed in California, and while the species in question is in both places, the look-alikes are different from one place to another. For larger countries it might be helpful to break it up more than by country.

Anotado por charlie hace 11 días

I am curious about something that came up. I am wondering how this experiment handles things like this.

https://www.inaturalist.org/observations/4580033

This observation is a cottontail rabbit track, confirmed 7 years ago by several certified wildlife trackers. However, the experiment 0.4 asks folks who have previously identified cottontail rabbits to identify this track. I think the AI doesn't have fine enough sorting capabilities to sort out track images from images of the actual animals. So, people who may not necessarily be trained in tracking are being asked to identify something that's outside their wheelhouse. (A track rather than an animal.) That in turn lowers the quality of the original observation from "cottontail rabbits" to "mammals." So, in effect, this has made the classification of this observation drop to a broader category than it originally was. Is this something that the folks doing the experiment are testing for? I'd be interested in seeing the results of this experiment. Or maybe an experiment designed specifically for animal tracks and sign?

I identified all 14 observations I was asked to look at. Most were the animals themselves. Curious if others got a mix of types of observations too? I did get a few tracks in the first rounds.

This is fun. Keep it up. Interesting thoughts are coming to mind out of this experiment.

Anotado por beartracker hace 11 días

@beartracker I don't understand how adding an ID of mammals to a RG rabbit obs makes it lower quality. The community ID stays the same unless you explicitly disagree with the lower level IDs.

Anotado por lappelbaum hace 11 días

That observation is still "Cottontail Rabbits". No-one has disagreed, so that hasn't changed.

Anotado por vireyajacquard hace 11 días

I guess I should have worded that different. Perhaps diluted would be a better word. I am trying to figure out a better word.

@beartracker you can experiment with how the CID algorithm reacts by adding and removing your ID - a broader ID without disagreement has no effect on CID.

Not sure this really helps quality control, when I am offered previous IDs of three experts who all agree with each other. I confess to agreeing reflexively and only actively revisiting a couple a minute later because I felt guilty. In some parts of the scientific world, only blind IDs would count for quality control. Why not pick examples which are only just RG, give neither observation references nor prior IDs nor comments, and ask reviewers to do blind IDs?

Anotado por ditchingit hace 10 días

@ditchingit the 'non-blind'/transparent ID process here is the same as for every museum and herbarium specimen that gets IDed by a second, third, etc person: each new identifier has full access to/can see previous det slips, including who made the IDs and what those IDs were. I understand your point, but iNat's approach is the norm rather than the exception

Anotado por thebeachcomber hace 10 días

@charlie That certainly is tricky.. it is even not sooo much about size of a country, but also about how divers it's habitats are. You might be able to ID at the seashore but not so much on the mountain top. However, in the end it is totally fine to not (or just roughly) ID observations one does not feel comfortable with

Anotado por ajott hace 10 días

I was asked to participate despite being a generally broad-only identifier, not sure I added anything of value haha. I got a couple blob style photos I couldn't make out and would ordinarily just hit reviewed and skip, had to put a non disagreeing "life" ID on one that already had cID at order which I'm sure will look very silly to anyone looking back at the obs without context.

Anotado por alloyant hace 10 días

I was sent a sample of just one and it was in another country. Not familiar with plants there.

Anotado por bobbie79 hace 10 días

Done. This time was much easier. I had 17 observations mostly from regions ans species I am familiar wiht. IDying was much easier than the last time. I had one observation without location whicht did not make much sense to me.
However, it was great fun for me to participate!
Is there anything wrong if I would use the filter to through observations in my aera and ID them?

Anotado por misumeta hace 10 días

Got 70+ observations this time, actually liked it, but it felt a little like an actual id session. ':D
Caught the wrongly ided bird, quite a slim chance for that!

Anotado por marina_gorbunova hace 10 días

Thanks for the great discussion and for participating. We've already had over 3k validators respond and over 94% of the sample validated. We can't tell you how grateful we are to be able to engage the expertise of so many of you to do this work. This incredible community of expertise really is what makes iNaturalist so special and we are so appreciative for your participation.

So far, the results are looking almost identical to past experiments with the research grade subset having 95% correct, 1% incorrect and 4% uncertain.

A lot of the conversation here is about whether the average accuracy of the iNat dataset is the right thing to measure. Given how biased it is towards common species, would it be better to orchestrate experiments to estimate the accuracy of rare species? I wanted to talk a bit about this topic.

The graph below shows dragonflies and damselflies (order Odonata) on iNaturalist. There are about 6000 described species of Odonata. On iNaturalist we've observed almost 4,000 of them. The black curve shows the number of observations by species sorted from most observations (Blue dasher) on down. This is on a log scale which shows you just how biased the iNat dataset is towards common species. Blue dasher has almost 100,000 observations, but only ~1000 species have at least 100 observations. Pale bluet is an example of a species with 100 observations. Many species like Dull Jewel have only 1 observation. Others like Swamp Groundling still have 0 observations.

The vertical line separates this subset of ~1k on the left side of the graph with >100 observations (lets call them common species) from the subset of around 5k on the right side of the graph with <100 observations (lets call them rare, data deficient species). This also roughly corresponds to the set of species in the Computer Vision and Geomodels vs those that don't have enough data. The pink line shows the number of IDers with at least 3 improving IDs (candidate validators) for each of these species. The ratio of candidate validators to observations is about 5%, ie a species with 100 observations has 5 candidate validators.

One of the interesting things about iNaturalist is that part of our mission is about getting lots of people connected to nature and part of our mission is about scaling global biodiversity monitoring. The former is focused more on the left side of the graph. Most of the dragonflies most of the people using iNaturalist are going to see are going to be species like Blue Dasher, and its really important that iNaturalist works well and provides good information about these species. The latter is focused on the right side of the graph, most of the discoveries and important contributions to science and conservation are probably coming from this frontier of rare species.

Much of our philosophy for bringing these rare species into focus is just to grow iNaturalist as a whole. That lifts the whole curve. As iNat grows, yes we'll get many many more encounters with Blue Dasher (which is really important for our mission of building broad advocacy for nature), but we'll also get more rare species too.

But because these experiments are using a random sample of iNat observations, they are biased towards the left side of the graph. For example, in Experiment v0.3, there were 134 common Odonata in the sample of which 128 were correct giving us an estimate of 95% with pretty high confidence. There were only 3 rare, data-deficient observations in the sample. All 3 were correct, but this estimate of 100% is very uncertain and low confidence due to low-sample size.

It sounds like there's quite a bit of interest about trying to estimate the accuracy of observations of these rare species. We can design a non-random sample that will do that. With fewer candidate validators, it will be harder to orchestrate and we may need different validator criteria. But we agree it would be very interesting and important towards better understanding the right side of the graph.

Anotado por loarie hace 10 días

@marina_gorbunova: Wow!

@misumeta I can't say for sure without checking, but it feels like each time I get more and more, not complaining, initiative leads to that. :)

I completed my 105 ID's, but possibly not as well as I often do with only the new observation that I just looked at to think about. While I knew that I had until May 13 to complete them, I wanted to get them all done. I usually ID a relatively large range of organisms, but mostly lowland, terrestrial organisms, and mostly in my "Puget Trough" bio-region, as I have delineated it. Appropriately I didn't get any salt water organisms from the Puget Sound itself to ID, and didn't notice any alpine organisms. It was a bit frustrating on a few ID's in the experiment group to be asked to ID things across the continent from me, where I don't know the look-a-likes that occur there, but not too bad. The majority of observations I was given to ID were in, or close, to my Puget Trough region. As I knew I didn't have to give more specific ID's than I could do in the time I felt like spending on it, the cross-continent observations I was given weren't too bad. Normally I vary a lot in how much time and effort I take to make an ID. I high percentage I may do in a second, but when I feel like it, I might take up to hours to work to get a good ID on something I don't already know well, and maybe that I want to know better. Any stats on how well I ID any group would be determined from the ID's I made after an unknown amount of time. I'm sure the average amount of time I spent per ID on the experiment group was less than the average time I spend otherwise, when I'm not aiming to complete the next 105 ID's. I almost felt like iNaturalist only wanted my quicker ID, so when I spent a bit more time on an ID I wasn't sure I was following instructions or expectations. For example, I spent some extra time going through the Similar Species section to make a specific cross-continent ID on a species I would have done quickly if it were in my Puget Trough area.

I also find I need to do a lot of review all of the time to keep up with my knowledge in any given group, some more than others, and I don't expect the sample I was given to ID reflected how up to date I was reviewing any given taxon group. For example, I used to do a lot more work every Spring and Fall, reviewing all of my fungi to keep up with my fungus ID knowledge, but haven't been inspired to do that much work reviewing my fungi for maybe 2 - 3 years now, so my current ability to ID fungi may be less than my fungus ID stats would show.

I now hope that what I did helps determine how accurate iNaturalist ID's are. While I think iNaturalist ID's are generally pretty good, I don't count on a "Research Grade" ID to be reliable.

Anotado por stewartwechsler hace 10 días

this frontier of rare species

@loarie then please reconsider destroying placeholder text. Your rare species are often carefully named in 'shall we use this as a placeholder for you?' Yes, thanks!
That is what I use to - flag for curation - please add missing species.

We have discovered that iNatters can get around that working as intended, if they opt out of CID. Then the observer's intention is respected. But only ONE iNatter has shown me that.

Anotado por dianastuder hace 10 días

@loarie Scott, I read with great interest your analysis above of the Odonate sample. Randomized sampling is great but I have many of the same questions about commoness-rareness and biases in the validator population sample (selection bias, not ID or personal biases). Here are a couple of questions that popped into my head:

If I'm recalling correctly, on the Experiment 0.4 page, there's a graph showing the number of validators vs the number of taxa assigned. It seems that the number of validators is ripe for biases. Let me preface this by saying that, while taxon commonness and ease of identification are not necessarily correlated, I expect they are strongly associated. This will give rise to this conundrum: If there are on average more validators assigned for common (read: easily identified) species, then the accuracy statistic will be biased upward. Conversely, if a few specialists are all that are "available" to validate a handful of rare (read: harder to ID) species but apply their expertise to those groups with enthusiasm, that could potentially bias the rare species towards higher accuracy. Perhaps the small subset of validators selected for looking at and identifying rarer species may possess a non-random level of knowledge and/or dedication to IDing such observations. I know that iNaturalist is not inclined in general to "select" experts (e.g. curators) or gage expertise, but for the Accuracy Experiments, there would seem to be a need to look into stratified sampling of validators under some such criteria, both for common and rare species.

Speaking of stratified random sampling, I'd like to hear more about the randomized sampling of observations to obtain a set for each experiment. Is a random sample of the complete pool of observations truly the best way to eventually gage "accuracy" of IDs? To answer my Q1 above or to delve into other nuances of biases in commonness or rarity, it might be useful to stratify the sampling of observations on the commonness scale to look at stats independently in three or four gross levels (e.g. abundant, cosmopolitan species; regionally common species; regionally rare; local endemics).

It's clear that the task of validation is not shared equally among the pool of validators. Comments here and elsewhere show the range of attitudes--for lack of a better word--towards being the recipient of high or low numbers of observations to validate. Are validators with just a couple of observations more likely to delve into the nuances of IDs and provide stronger support for outcomes? Are overtaxed validators who have too much on their plate or for whatever reasons must minimize their validation efforts going to offer weaker efforts at validation choices? This would be a hard spectrum to delve into but I sense from the various comments on each experiment that it may be a concern.

Thanks again for the superb analyses.

Anotado por gcwarbler hace 9 días

An experiment that tells us that RG observations of common and easy to ID species are generally ID'd accurately is not very informative. It does not require monthly experiments to confirm the results.

There's a difference between "rare species" and "difficult to ID species". I actually suspect that the really rare (and unfamiliar) species are probably ID'd fairly accurately, because the average user does not know enough to suggest them, particularly if they are species not included in the CV.

While many commonly observed species are also easy to ID, there are plenty of species that are common but difficult to ID for one reason or another. For example, I've been reviewing European Xylocopa observations. These are big, conspicuous bees and consequently observed fairly frequently (23,000+ observations in Europe). In the areas where the ranges are known to overlap, the misidentification rate of RG observations (wrong species, or a species level ID where it should be left at genus or subgenus) is quite high. I haven't been keeping statistics, but I'm fairly sure it is well over the 5% error rate of the iNat-wide experiment.

While such taxa may not make up a large enough portion of the dataset to substantially affect the overall accuracy rate, this does not mean that misidentification of these groups (and lack of expert IDers) is not a problem. Most people are not using iNat's data set as a whole; rather, they are interested in specific taxa. It means very little to be told that most plants are correct if one is studying mosses or ferns or even one of the tricker groups of vascular plants (daisies anyone?). The fact that honeybees are generally correct is of little value for someone studying solitary bees. Etc.

For difficult taxa, a high accuracy rate is a likely to be not a reflection of how well the average user manages to ID the observations, but of the existence of a handful of indefatigable IDers who have gone through and corrected all the observations. Talking about "accuracy" in an abstract sense fails to acknowledge how much this depends on the knowledge and effort of specialist IDers, or the fact that if these people were to stop IDing the accuracy of that taxon would likely decrease substantially. For these taxa, the community is not largely self-correcting, because IDing requires knowledge and skills that the average user does not have.

Anotado por spiphany hace 9 días

I concur.
Rarer species, whether easy to ID or not, tend to be identified by specialists, or else ignored at some higher level (tribe, genus), unless similar enough to a common species to be misidentified.
These misidentifications may be an insignificant proportion of the common species, but may be a substantial proportion of observations of the rarer species.
Our specialist identifiers are the heart of documenting our biodiversity. Groups without specialists languish at higher taxon levels, as do groups urgently in need of taxonomic review.

Anotado por tonyrebelo hace 9 días

INat is currently incredibly slow, trying to upload 50 observations for the last 3 days now. Thanks for the invitation to take part in the Accuracy Experiment v0.4. I will certainly consider such an invitation as soon as iNat restarts working at normal speed.

Anotado por mreith hace 9 días

From Antilles? No problems in South Africa - just uploaded 217 in the last hour.

There is a general problem with images getting to "research grade" too easily. The following scenario occurs extremely frequently:
An observer posts an image they cannot identify. A second person offers an identification for the image. The observer then agrees with the suggestion, but actually has no personal knowledge to back that up. Bingo, the image is "Research Grade" but in reality only one person has offered an identification.

I suggest that where the poster has not initially identified the image, then their agreeing with an offered ID will not count. Two independant ID's should be required.

Anotado por acclivity hace 9 días

Yes, that happens too often

Anotado por arnim hace 9 días

I worked through these very fast because I'm teaching a class. I hope I did OK. Most were very appropriate observations to ID. I did chuckle at the palm tree from United Arab Emeriates, not a place this botanist from Oregon usually ID's for, but I have to admit that I did ID a number of photos there once, when there was a problem school class. Unfortunately, all I could do with the palm was Palm Family.

Yes, I did chuckle a bit at a Rubus armeniacus/bifrons observation. We argue over the best name, so hard to know what the name posted on it may mean.

Anotado por sedgequeen hace 9 días

Help, what am I supposed to do with this one? Just leave it? :D
https://www.inaturalist.org/observations/88856075

Somehow I got some plants to ID as well this time, even tho I never ID plants, only terrestrial isopods.

Anotado por naturestephie hace 8 días

Do these need to be done by the end of the 13th or before the 13th? "By the 13th" can be interpreted both ways.

Anotado por rymcdaniel hace 7 días

I got 20, which was quite manageable, as I normally do a fairly high volume of IDs on ants and wasps as it is, the paper wasps in the dataset did slow me down a bit though, I do like IDing those but I don't know them off the top of my head so I have to scan through my field guide for each one, but overall it was not a burden at all

I am confused as to why I got obs that were not RG? I thought this was supposed to be measuring accuracy of RG obs?

Anotado por insectobserver123 hace 7 días

I hope by the end of the 13th - still have a few I wanted to check keys for but have to make it to the finish line for the semester first. Final exam week, graduation events over the weekend, and grades due by midday today has taken priority.

Anotado por annkatrinrose hace 6 días

the deadline is the end of the day (2024-05-13 23:59:59 UTC) - thanks everyone for helping validate this sample!

Anotado por loarie hace 6 días

I just want to point out that 23:59:59 UTC is not the end of the day in most time zones, here in the eastern US in daylight savings time it is 1 second before 8:00 PM

Anotado por insectobserver123 hace 6 días

I received two observations to identify - they were both previously identified by myself I also did not change my previous identifications

Anotado por malthinus hace 6 días

We've update the post above with the results - thanks again everyone!

A focus on both: rarely observed and hard to ID (rare or not) species is certainly the way to go.
From my experience with insects in Africa, I see several problematic issues here:
(1) Observations are left at genus or family level or follow CV suggestions blindly because there are no experts active on iNat --> validitation not possible due to lack of competent validitators
(2) Observations are identified to species level because it has been identified on iNat to that species, but the taxon is in need of revision and Ids to species are not possible, one such example would be Eristalinus megacephalus in Africa which still has some RG observations https://www.inaturalist.org/observations?nelat=-8.2032838&nelng=38.2216904&place_id=any&swlat=-47.1313489&swlng=11.4696999&taxon_id=359895 (the issue is that E. megacephalus could also be E. tabanoides and currently there is no reliable info available on how to tell them apart or if they are a species complex). --> validitators should be able to know what is identifiable and what should be left at subgenus/genus/subfamily. --> do we have these expert validitators on iNat knowing African Syrphidae.
(3) Observations are identified to RG because the community knows the identification was made by someone considered an expert and iNatters blindly agree on such IDs - good if the expert does not make mistakes but bad if the expert makes mistakes Iding something outside their narrow field of proper expertise (expert does not know the regional species or is too lazy to properly check before making an ID) --> validitators must strictly not be influenced by "expert" IDs and do their own research before agreeing.
(4) Observations are left at genus or family level, but a finer ID is possible, but the community does not know how and where to look up these species and genera --> validitators for this category do not exist
(5) commonly misidentified species, some due to faulty CV suggestions such as Neomyia which CV suggests to be Calliphoridae or Brachycerus made Bronchus by CV and other commonly misidentified species. --> Is there a way to filter out such problematic taxa for a validitations sample?

However - please find an experiment design to look into the tricky species - that would add value to the data set.

Anotado por traianbertau hace 5 días

@traianbertau - but in none of the Eristalinus megacephalus currently in southern Africa https://www.inaturalist.org/observations/identify?quality_grade=needs_id%2Cresearch%2Ccasual&taxon_id=359895&place_id=113055 have you stated this, or have you enforced it (by disagreeing when you added a subgenus/genus level ID - I see you have in some other observations).

Anotado por tonyrebelo hace 5 días

I received an observation in Norway, where I've never been, so I was a bit confused about the geographical constraints in this version of the experiment. Reading the description, though, I can imagine one way this may have happened:

"We're now requiring at least 3 improving IDs within the same country to try to better match observations and validator experience."

I sometimes open global unknowns or kingdom-level IDs and try to ID them a bit more precisely, in the hopes that more specialized ID'ers will find them. Mostly these are plants, and unless they're local to me or they're a particularly recognizable taxa, I generally don't ID more precisely than class. But given the phrasing, it sounds like if I moved a few observations from "Unknown" to "Pinopsida" or "Magnoliopsida", that counts as an improving ID and I'm thereby qualified to ID in Norway?

Anotado por guerrichache hace 5 días

@tonyrebelo

Yes, I should have disagreed, but this is nothing one can support with a scientific paper, just a well known problem.

Excellent work.

Thank you for running the 4th version.

I look forward to participating in any future accuracy experiments.

Anotado por mjpapay hace 5 días

As @insectobserver123 pointed out above, 23:59:59 UTC is not the "end of the day" for half of the world, and it wasn't an intuitive interpretation for me. Since this is a global effort, I think it would be a good idea to be more specific about the deadline in the invitation email.

I deferred identifying about half of my batch - mainly ones in distant areas where I wanted to check for local lookalikes - until the final day. I thought I had finished with hours to spare, but apparently I identified dozens of them after the deadline.

Anotado por d2b hace 5 días

Yes, I hear "by 6/13", and I think the latest I can put in an ID is either 04:59:59 UTC on 6/12 or 04:59:59 UTC on 6/13, just like my college homework. I would never have imagined the deadline was at dusk

Anotado por insectobserver123 hace 5 días

Do you have stats about accuracy versus the amount of people who have identified it?

Anotado por kroeckx hace 4 días

@kroeckx I've never seen accuracy statistics based on the number of identifications but I can guess (based on experience). Observations with strictly less than three identifications are most likely to be in error, I think. Research-grade observations with exactly two identifications are the most insidious case since such observations do not routinely show up in searches.

Anotado por trscavo hace 4 días

@kroeckx, do you mean are we storing the ID ledger on sample observations before the experiment kicks off? We are not - but its something we could do. Its partially possible to rewind/reconstruct an ID ledger to a previous point in time but not perfectly because (1) people can delete IDs and even observations, (2) while only one ID per person can be current at any time there are weird edge cases where someone could, for example, manually mark a more recent ID as not-current and an older ID as current that we can't reconstruct.

Anotado por loarie hace 4 días

Añade un comentario

Entra o Regístrate para añadir comentarios