A Real, Living, Breathing Human Wrote This, I Swear.
- Margaret Armstrong
- Mar 21
- 5 min read
Generative artificial intelligence is a controversial but popular subject that poses many concerns about the integrity and quality of the work people make. In the university setting, policies are in place to manage how AI is used, and the consequences for misuse can be quite serious. These consequences are only worsened when AI was not used in the first place.
My experience with false AI accusations.
I got a comment from a professor saying they wanted to meet with me before grading my assignment, which was writing the more “technical” part of a grant proposal. I wasn’t proud of what I turned in necessarily, but I didn’t think I did THAT bad to warrant a meeting with my professor. Then, when I met with them, they asked me if I used ChatGPT for the assignment. I was floored.
Admittedly, I might use artificial intelligence for schoolwork when allowed, but I did not touch it for this assignment because the it is not permitted for the class. I had no defense for myself, and my shock prevented me from coming up with any defense at that moment. All I could say was that I am, evidently, not that great at coming up with the business side of a grant proposal.
My professor didn’t seem to believe me, so I dug through my search history from weeks earlier to show them the research I did and the lack of ChatGPT while I did the assignment. After several email exchanges, they believed me. Congratulations to me! No more fear of undeserved academic misconduct!
The difficulty to prove what is human and what is not.
BUT there’s still one critical issue. Anyone can delete parts of their search history to make it look clean of ChatGPT. What I learned from this experience is that, currently, there is no accurate way to prove someone did or did not use artificial intelligence. Unless everyone starts screen-recording everything they do on their laptop (which I doubt people want to do).
As AI becomes more sophisticated, the challenge of distinguishing between AI-generated and human-created content has become increasingly complex. Research indicates that humans can only differentiate AI-generated text about 53% of the time, barely better than random guessing. Even with training, human accuracy in detecting AI-generated content doesn't significantly improve🫠. It’s not looking good for us.
Since humans cannot tell what is human-written, we have to go back to the same technology to determine what is human and what is not. AI checkers are rising in popularity, but they are not the reliable solutions we want them to be.
The most promising but flawed way to detect AI.
There are TONS of potential AI checkers out there for free. Like ChatGPT, AI checkers are large language models (LLMs) themselves. Whatever text given to them goes through this model to see if the text was human or AI-written. Typically, they give a percentage and point out what is likely AI-generated text.
AI checkers determine the humanness of text on one of two different scales: the burstiness and the perplexity.
Burstiness describes the variance in sentence structure of the text. AI-generated content typically varies much less in sentence structure and length than human-written text. People like to shuffle sentences around to have more fun with their writing, and AI can’t quite emulate that. Although, AI has improved significantly as it continues to learn the variance in human writing. On the other hand, AI excels at producing formulaic text for more informative, technical pieces like my grant proposal.
Even for this post, I used Perplexity, another AI that acts like a search engine (and always links sources!), to write one of the previous paragraphs. Go back to the one with the emoji. Everything in that paragraph before the emoji was written by Perplexity after I asked it for an essay on the complications of detecting AI. I think it did a decent job, and it gave me a good source!
Measuring the perplexity of the text is still an option. Perplexity, not the AI tool, is basically how much the text surprised the AI, based on what the AI has used to learn and model. If the AI checker sees a sentence that strikes it as strange, that sentence has higher perplexity and is more likely to be human-written. An article on why AI checkers don’t work provides a very simple example of perplexity while demonstrating its flaw as a tool to determine human-written work. The sentence “I’d like a cup of ___” is most likely to end with “water” or another drink, and that is what AI would assume. If the sentence ended with “spiders,” the AI checker would flag that as human-written because of its high perplexity. That sentence typically would not end with “spiders.”
However, it is completely possible that a human would write “I’d like a cup of water,” but because of its low perplexity, it might get flagged as AI-generated text. Again, a technical document might get marked as AI-generated because of its formulaic structure and word-use. Neither perplexity nor burstiness are foolproof methods to distinguish between human writing and AI writing, so these AI checkers, for the time being, are still unreliable ways to prove oneself in the face of AI accusations. A false positive from an AI checker can have serious consequences for a student who did not use artificial intelligence for an assignment, so like with AI in general, it is best to use them with a critical lens.
Biases of AI Detectors.
Besides the issues that come with using perplexity and burstiness as a tool to measure humanness, studies on AI checkers indicate some implicit biases that come through with these programs. They disproportionately mark text written by non-native English speakers as AI-generated content. This links back to the perplexity of the text. Since non-native English speakers might have less varied vocabulary than a native English speaker, an AI checker is more likely to assume that a non-native English speaker’s writing is AI-generated due to its lower perplexity. The same occurs for writers with different dialects and backgrounds.
Again, this can result in serious consequences for students or workers who get a false positive for their work. AI checkers not accounting for the variance of English or even the lack of variance—in the case non-native speakers—is yet another one of their drawbacks. They are definitely tools, but they are certainly not what we want them to be yet.
AI detection for this post.
When I decided on this topic, I thought it would be the perfect opportunity to implement

some AI-generated content and see how the AI checkers react to it. I used ZeroGPT and QuillBot, the two that popped up first when I searched for an AI checker. I pasted in the paragraph, including the emoji and last sentence I added, to ZeroGPT and to my surprise (not really), it said that that paragraph was 100% human-written. When I removed the part that I wrote, I only got a score of 75.51%, so ZeroGPT caught it (but not all of it).
For QuillBot, I had to add one of my previous paragraphs because it had a length requirement. I got a clean score with and without the emoji. Even when I pasted in the entirety of this post before this section, Quillbot still said that it is human written. It said 9% was human-written and AI-refined, which was inaccurate. The parts of the text that it marked weren’t even from the paragraph that I used Perplexity for. Perhaps because of the

informative nature of this piece, the AI-generated content blended right in with the rest of my human-generated content. Either of these generators might have easily detected AI if I used it for the whole post, but not everyone uses AI to write entire pieces.
Artificial intelligence is always improving, so maybe these checkers will probably be more reliable in the future and hopefully not disproportionately target certain populations. For now, they are still learning how to decipher between human writing and machine writing just like we are.
Written by Margaret Armstrong.
This was such an interesting topic to read and learn about. It's interesting to me how AI detectors can tell the difference between human vs. AI perplexity and burstiness. I can't imagine what the future of AI will look like especially when it comes to writing, I can only imagine it will be harder and harder to differentiate. Thanks for sharing!
I found this article very intriguing, specifically I loved the fact that you had us check back to highlight a part of the article that was AI-generated. It was also very interesting that the AI detection service got different results for portions of the same work. I too am stuumped at this point as to how to truely differentiate what is human and what is AI. I'm not sure we will ever reach a resolution, honestly.
I also was accused of using AI for an assignment when I didn't! Ironically, it was a discussion board about the the ethical use of AI. I had to show my professor the version history of my Google Doc to prove my innocence. It's getting harder and harder to tell the writing apart!
I really like this topic you chose, the ideas are so relevant to our society and as a future educator, are ever present in my life!! Nice work!