Originally posted on Plio’s Blog
End-to-end testing is often discussed in ideal terms — stable environments, modern browsers, and predictable user behavior. In practice, teams building software for real users operate under far more constraints.
This post documents how we approached end-to-end testing while working on Plio, an open-source education platform, and what that work taught us about taking responsibility for quality under real-world conditions.

Plio is one instance of this work — but the decisions, trade-offs, and constraints described here show up repeatedly in long-running, user-facing systems.
Plio’s priority has always been building a simple and easy-to-use platform. The team invested significant effort in code reviews, unit tests, and multiple rounds of verification before releasing features.
As the platform stabilized, a more fundamental question surfaced:
How do we ensure the system works as a whole when an actual user loads it up?
How do we verify that independently tested components behave correctly together?
Is there a way to test Plio as a real user would — and do so automatically?
These questions marked the point where confidence based on individual components was no longer enough.
End-to-end testing (E2E testing) involves validating an application’s workflow from beginning to end by simulating real user interactions. The goal is not just to test isolated functionality, but to understand whether the system behaves correctly when all components interact together.
A typical E2E test might involve:
This mirrors how a real user experiences the system.
At this point, we had to choose an E2E testing tool that aligned with Plio’s constraints — not just what was popular or powerful.
We evaluated tools such as Selenium and Cypress, each with clear strengths:
This forced a conscious trade-off.
We were not optimizing for ecosystem popularity or feature completeness. We were optimizing for real user coverage under existing constraints.
That led us to choose TestCafe, which offered:
This decision was less about TestCafe being “the best tool” and more about it being the right tool for this context.
Writing the initial test cases was straightforward. TestCafe made it easy to model browser actions and add assertions.
More importantly, we made a conscious decision not to chase exhaustive coverage at this stage. The goal was to protect critical user flows and understand system behavior — not to automate every possible interaction.
This restraint helped keep the test suite maintainable and meaningful.

Once local test cases were stable, the next responsibility was to ensure they ran automatically.
Manual execution would have undermined the very confidence we were trying to build.
We integrated E2E tests into the CI pipeline using GitHub Actions, ensuring that:
This shifted testing from an optional activity to an owned system responsibility.
While the CI setup worked as expected, it surfaced a limitation we could not ignore.
The CI environment only validated behavior on a narrow set of browsers running on Ubuntu. This represented a tiny fraction of Plio’s real user base.
At this point, we had a choice:
We chose the latter.
Using BrowserStack, we expanded E2E testing across:
This did not magically eliminate all risk — but it made system behavior far more visible under realistic conditions.
A recording of an automated E2E test running on Firefox + OSX Catalina on BrowserStack:
It’s worth naming what we deliberately did not optimize for:
Those choices would have increased complexity without meaningfully improving confidence at that stage.
This work shaped how we think about testing in other systems as well — especially those operating under real-world constraints, diverse user environments, and high responsibility.
While tools, stacks, and domains vary, the underlying responsibility remains the same:
to build confidence that the system behaves as intended when it matters most.
Plio was one instance of that responsibility being exercised.
As Martin Fowler has noted, end-to-end tests offer the highest confidence when deciding whether software is working as intended.
Used thoughtfully, they help teams:
That confidence — not tooling — is the real outcome.
When does end-to-end testing actually become necessary in a real system?
Usually, when confidence in system behavior depends more on individual vigilance than on shared mechanisms, and when changes start feeling riskier than they should.
Is it practical to aim for full end-to-end test coverage?
In constrained environments, chasing full coverage often increases complexity and maintenance cost without providing proportionate confidence.
How do you choose an E2E testing tool when user environments are fragmented?
Tool choice needs to reflect real user behavior and constraints. In our case, cross-browser support mattered more than ecosystem popularity or feature completeness.
Can end-to-end testing replace other forms of testing?
No. End-to-end tests complement unit and integration tests by validating system behavior as a whole, not by replacing lower-level checks.