Bringing the power of End-to-End testing in Plio’s open-source platform
Vaibhav Rathore
March 22, 2022

Originally posted on Plio’s Blog

End-to-end testing is often discussed in ideal terms — stable environments, modern browsers, and predictable user behavior. In practice, teams building software for real users operate under far more constraints.

This post documents how we approached end-to-end testing while working on Plio, an open-source education platform, and what that work taught us about taking responsibility for quality under real-world conditions.

ColoredCow team discussing end-to-end testing decisions during Plio development
Working through testing decisions during active development — discussing trade-offs and constraints as the system evolved.

Plio is one instance of this work — but the decisions, trade-offs, and constraints described here show up repeatedly in long-running, user-facing systems.

Plio’s priority has always been building a simple and easy-to-use platform. The team invested significant effort in code reviews, unit tests, and multiple rounds of verification before releasing features.

As the platform stabilized, a more fundamental question surfaced:

How do we ensure the system works as a whole when an actual user loads it up?
How do we verify that independently tested components behave correctly together?
Is there a way to test Plio as a real user would — and do so automatically?

These questions marked the point where confidence based on individual components was no longer enough.

End-to-End Testing

End-to-end testing (E2E testing) involves validating an application’s workflow from beginning to end by simulating real user interactions. The goal is not just to test isolated functionality, but to understand whether the system behaves correctly when all components interact together.

A typical E2E test might involve:

  1. Opening a browser
  2. Visiting the login page
  3. Entering credentials
  4. Navigating through the application
  5. Performing a real user action
  6. Verifying the expected outcome

This mirrors how a real user experiences the system.

Selecting an E2E tool

At this point, we had to choose an E2E testing tool that aligned with Plio’s constraints — not just what was popular or powerful.

We evaluated tools such as Selenium and Cypress, each with clear strengths:

  • Selenium is a mature and widely adopted tool, but comes with a heavier setup and operational overhead than we were comfortable introducing at that stage.
  • Cypress has excellent developer experience and strong community support, but lacked reliable Safari support at the time — a significant limitation, given that a meaningful portion of Plio’s users accessed the platform via Safari.

This forced a conscious trade-off.

We were not optimizing for ecosystem popularity or feature completeness. We were optimizing for real user coverage under existing constraints.

That led us to choose TestCafe, which offered:

  • simpler setup
  • cross-browser support, including Safari
  • enough flexibility to model real user workflows without excessive complexity

This decision was less about TestCafe being “the best tool” and more about it being the right tool for this context.

Writing Test Cases

Writing the initial test cases was straightforward. TestCafe made it easy to model browser actions and add assertions.

More importantly, we made a conscious decision not to chase exhaustive coverage at this stage. The goal was to protect critical user flows and understand system behavior — not to automate every possible interaction.

This restraint helped keep the test suite maintainable and meaningful.

Continuous Integration

Once local test cases were stable, the next responsibility was to ensure they ran automatically.

Manual execution would have undermined the very confidence we were trying to build.

We integrated E2E tests into the CI pipeline using GitHub Actions, ensuring that:

  • tests ran on every pull request
  • failures were visible early
  • confidence was shared, not dependent on individual effort

This shifted testing from an optional activity to an owned system responsibility.

BrowserStack

While the CI setup worked as expected, it surfaced a limitation we could not ignore.

The CI environment only validated behavior on a narrow set of browsers running on Ubuntu. This represented a tiny fraction of Plio’s real user base.

At this point, we had a choice:

  • accept partial confidence and move on, or
  • extend testing to reflect real user environments

We chose the latter.

Using BrowserStack, we expanded E2E testing across:

  • multiple operating systems
  • multiple browsers, including Safari
  • mobile and constrained environments

This did not magically eliminate all risk — but it made system behavior far more visible under realistic conditions.

A recording of an automated E2E test running on Firefox + OSX Catalina on BrowserStack:

What we chose not to optimize for (explicit non-decisions)

It’s worth naming what we deliberately did not optimize for:

  • full test coverage across all features
  • perfectly deterministic E2E runs in every environment
  • replacing other forms of testing

Those choices would have increased complexity without meaningfully improving confidence at that stage.

What this work represents beyond Plio

This work shaped how we think about testing in other systems as well — especially those operating under real-world constraints, diverse user environments, and high responsibility.

While tools, stacks, and domains vary, the underlying responsibility remains the same:
to build confidence that the system behaves as intended when it matters most.

Plio was one instance of that responsibility being exercised.

Final words

As Martin Fowler has noted, end-to-end tests offer the highest confidence when deciding whether software is working as intended.

Used thoughtfully, they help teams:

  • reduce breaking changes
  • release more confidently
  • protect real users

That confidence — not tooling — is the real outcome.

Questions teams often ask after this work

When does end-to-end testing actually become necessary in a real system?
Usually, when confidence in system behavior depends more on individual vigilance than on shared mechanisms, and when changes start feeling riskier than they should.


Is it practical to aim for full end-to-end test coverage?
In constrained environments, chasing full coverage often increases complexity and maintenance cost without providing proportionate confidence.


How do you choose an E2E testing tool when user environments are fragmented?
Tool choice needs to reflect real user behavior and constraints. In our case, cross-browser support mattered more than ecosystem popularity or feature completeness.


Can end-to-end testing replace other forms of testing?
No. End-to-end tests complement unit and integration tests by validating system behavior as a whole, not by replacing lower-level checks.