From 82c50458a1ea0e41bb63654fbaf2a6650a8bf979 Mon Sep 17 00:00:00 2001 From: Feanil Patel Date: Thu, 19 Mar 2020 10:57:14 -0400 Subject: [PATCH 1/4] Add a draft decision related to bokchoy experiment. --- .../decisions/0003-reduce-bokchoy-testing.rst | 58 +++++++++++++++++++ 1 file changed, 58 insertions(+) create mode 100644 docs/decisions/0003-reduce-bokchoy-testing.rst diff --git a/docs/decisions/0003-reduce-bokchoy-testing.rst b/docs/decisions/0003-reduce-bokchoy-testing.rst new file mode 100644 index 0000000000..35de5c3865 --- /dev/null +++ b/docs/decisions/0003-reduce-bokchoy-testing.rst @@ -0,0 +1,58 @@ +Status +~~~~~~ +Draft + + +Context +~~~~~~~ + +edx-platform bokchoy tests are slow, flaky and difficult to debug. A quick assessment of their value shows that they might be more trouble than they are worth. And that we might get the same benefit with far fewer tests. + +Baseline Data:(Last 7 days) +--------------------------- + + Total number of builds: 253(across 106 PRs) + + Failures: 49(across 24 PRs) + True Failures: 10(across 6 PRs) + Failures that wouldn’t be caught by other test: 3(on 1 PR) + +Color +===== + + Of the real failures found, there was one PR which had a failure that was only found via bokchoy and a115 tests. + - This PR made a JS change which would have broken many pages from loading. + +Recommendation +-------------- + + As an experiment, we should not run bokchoy tests but continue to run a11y tests which will reduce the total number of tests significantly but continue to act as a smoke test for issues that can be caused by the fact that our frontend in edx-platform is still quite highly coupled together. + + We'll run in this mode for a month while we collect more data according to the test plan below. This should give us either the confidence to significantly reduce the number of bokchoy tests or good reasons not to. + +Test Plan +========= + + 1. Deactivate bokchoy tests on master and all PRs but leave a11y tests running. + - The a11y tests will act as a proxy for the small number of UI tests that would catch most major issues. + + 2. Collect data on which issues bokchoy would have caught by running them manually font-of-band). + - On a Daily cadense for 1 month. + + 3. Assess Impact of change. + - We'll record the number of issues that bokchoy would have prevented. + - Both True issues and false positives(flakiness). + + +Outcome: Decision on whether or not to reduce the number of bokchoy tests. + +Experiment Results +------------------ + +TBD + +Consequences +------------ + +TBD + From d0f584ab7e8f96432484c4337d20d40d8c498d7e Mon Sep 17 00:00:00 2001 From: Feanil Patel Date: Thu, 19 Mar 2020 16:12:23 -0400 Subject: [PATCH 2/4] Update based on feedback. --- .../decisions/0003-reduce-bokchoy-testing.rst | 69 +++++++++++-------- 1 file changed, 39 insertions(+), 30 deletions(-) diff --git a/docs/decisions/0003-reduce-bokchoy-testing.rst b/docs/decisions/0003-reduce-bokchoy-testing.rst index 35de5c3865..a335f9a83e 100644 --- a/docs/decisions/0003-reduce-bokchoy-testing.rst +++ b/docs/decisions/0003-reduce-bokchoy-testing.rst @@ -1,58 +1,67 @@ Status -~~~~~~ +====== Draft Context -~~~~~~~ +======= edx-platform bokchoy tests are slow, flaky and difficult to debug. A quick assessment of their value shows that they might be more trouble than they are worth. And that we might get the same benefit with far fewer tests. -Baseline Data:(Last 7 days) ---------------------------- - - Total number of builds: 253(across 106 PRs) - - Failures: 49(across 24 PRs) - True Failures: 10(across 6 PRs) - Failures that wouldn’t be caught by other test: 3(on 1 PR) - -Color -===== - - Of the real failures found, there was one PR which had a failure that was only found via bokchoy and a115 tests. - - This PR made a JS change which would have broken many pages from loading. - -Recommendation +Baseline Data: -------------- - As an experiment, we should not run bokchoy tests but continue to run a11y tests which will reduce the total number of tests significantly but continue to act as a smoke test for issues that can be caused by the fact that our frontend in edx-platform is still quite highly coupled together. + This data was collected based on the results of bokchoy tests run across all edx-platform PRs over the last 7 days. - We'll run in this mode for a month while we collect more data according to the test plan below. This should give us either the confidence to significantly reduce the number of bokchoy tests or good reasons not to. + * Total number of builds: 253(across 106 PRs) + * Failures: 49(across 24 PRs) + * True Failures: 10(across 6 PRs) + * Failures that wouldn’t be caught by other test: 3(on 1 PR) + +Color +~~~~~ + +Of the real failures found, there was one PR which had a failure that was only found via bokchoy and a11y tests. + * This PR made a JS change which would have broken many pages from loading. + +Recommendation +============== + +Based on the info we have so far, we should only run a suite of smoke tests in bokchoy that ensure the frontend is not entirely broken. + +For the experiment, we will use the a11y bokchoy tests as simple stand-in for a suite of smoke tests, because it is already a much smaller suite of happy path tests. + +During the experiment, if we find we are missing coverage via a regression, we will first add a missing Python or JavaScript unit test where possible. Only if this isn't possible would we add to the smoke suite of bokchoy tests. + +We'll run in this mode for a month while we collect more data according to the test plan below. This should give us either the confidence to significantly reduce the number of bokchoy tests or good reasons not to. Test Plan -========= +--------- - 1. Deactivate bokchoy tests on master and all PRs but leave a11y tests running. - - The a11y tests will act as a proxy for the small number of UI tests that would catch most major issues. +#. Deactivate bokchoy tests on master and all PRs but leave a11y tests running. - 2. Collect data on which issues bokchoy would have caught by running them manually font-of-band). - - On a Daily cadense for 1 month. + * The a11y tests will act as a proxy for the small number of UI tests that would catch most major issues. - 3. Assess Impact of change. - - We'll record the number of issues that bokchoy would have prevented. - - Both True issues and false positives(flakiness). +#. Collect data on which issues bokchoy would have caught by running them manually out-of-band). + + * We'll look at the failures on the out-of-band bokchoy job to find any true failures that would be caught by the removed tests. + * On a Daily cadense for 1 month. + +#. Assess Impact of change. + + * We'll record the number of issues that bokchoy would have detected, when we manually run the bokchoy job out-of-band. + * Both True issues and false positives(flakiness). Outcome: Decision on whether or not to reduce the number of bokchoy tests. Experiment Results ------------------- +================== TBD Consequences ------------- +============ TBD From 144982af8c97bcb8571faedd9c6f630a03594535 Mon Sep 17 00:00:00 2001 From: Feanil Patel Date: Mon, 23 Mar 2020 09:56:53 -0400 Subject: [PATCH 3/4] Respond to PR feedback. - Fix indenting. - Small wording fixes. --- .../decisions/0003-reduce-bokchoy-testing.rst | 25 ++++++++++--------- 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/docs/decisions/0003-reduce-bokchoy-testing.rst b/docs/decisions/0003-reduce-bokchoy-testing.rst index a335f9a83e..750fa943e9 100644 --- a/docs/decisions/0003-reduce-bokchoy-testing.rst +++ b/docs/decisions/0003-reduce-bokchoy-testing.rst @@ -11,12 +11,13 @@ edx-platform bokchoy tests are slow, flaky and difficult to debug. A quick asse Baseline Data: -------------- - This data was collected based on the results of bokchoy tests run across all edx-platform PRs over the last 7 days. +This data was collected based on the results of bokchoy tests run across all edx-platform PRs over the last 7 days. - * Total number of builds: 253(across 106 PRs) - * Failures: 49(across 24 PRs) - * True Failures: 10(across 6 PRs) - * Failures that wouldn’t be caught by other test: 3(on 1 PR) +* Total number of builds: 253(across 106 PRs) +* Failures: 49(across 24 PRs) + + * True Failures: 10(across 6 PRs) + * Failures that wouldn’t be caught by other test: 3(on 1 PR) Color ~~~~~ @@ -29,7 +30,7 @@ Recommendation Based on the info we have so far, we should only run a suite of smoke tests in bokchoy that ensure the frontend is not entirely broken. -For the experiment, we will use the a11y bokchoy tests as simple stand-in for a suite of smoke tests, because it is already a much smaller suite of happy path tests. +For the experiment, we will use the a11y bokchoy tests as a simple stand-in for a suite of smoke tests, because it is already a much smaller suite of happy path tests. During the experiment, if we find we are missing coverage via a regression, we will first add a missing Python or JavaScript unit test where possible. Only if this isn't possible would we add to the smoke suite of bokchoy tests. @@ -40,17 +41,17 @@ Test Plan #. Deactivate bokchoy tests on master and all PRs but leave a11y tests running. - * The a11y tests will act as a proxy for the small number of UI tests that would catch most major issues. + * The a11y tests will act as a proxy for the small number of UI tests that would catch most major issues. -#. Collect data on which issues bokchoy would have caught by running them manually out-of-band). +#. Collect data on which issues bokchoy would have caught by running them manually out-of-band from the standard CI/CD process. - * We'll look at the failures on the out-of-band bokchoy job to find any true failures that would be caught by the removed tests. - * On a Daily cadense for 1 month. + * We'll look at the failures on the out-of-band bokchoy job to find any true failures that would be caught by the removed tests. + * On a Daily cadense for 1 month. #. Assess Impact of change. - * We'll record the number of issues that bokchoy would have detected, when we manually run the bokchoy job out-of-band. - * Both True issues and false positives(flakiness). + * We'll record the number of issues that bokchoy would have detected, when we manually run the bokchoy job out-of-band. + * Both True issues and false positives(flakiness). Outcome: Decision on whether or not to reduce the number of bokchoy tests. From 98a657e8492c0db9f93788504f35a63370b9420e Mon Sep 17 00:00:00 2001 From: Feanil Patel Date: Wed, 25 Mar 2020 12:23:46 -0400 Subject: [PATCH 4/4] Respond to feedback. --- docs/decisions/0003-reduce-bokchoy-testing.rst | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/decisions/0003-reduce-bokchoy-testing.rst b/docs/decisions/0003-reduce-bokchoy-testing.rst index 750fa943e9..c9f522cf71 100644 --- a/docs/decisions/0003-reduce-bokchoy-testing.rst +++ b/docs/decisions/0003-reduce-bokchoy-testing.rst @@ -28,7 +28,7 @@ Of the real failures found, there was one PR which had a failure that was only f Recommendation ============== -Based on the info we have so far, we should only run a suite of smoke tests in bokchoy that ensure the frontend is not entirely broken. +Based on the info we have so far, we will only run a suite of smoke tests in bokchoy that ensure the frontend is not entirely broken. For the experiment, we will use the a11y bokchoy tests as a simple stand-in for a suite of smoke tests, because it is already a much smaller suite of happy path tests. @@ -61,6 +61,11 @@ Experiment Results TBD +Decision +======== + +TBD - Based on experiment outcome. + Consequences ============