feat: management command to purge information about web certificates (#36287)

for Open edX  operators who still have users with legacy PDF certificates, retirement requires first extracting information from the user's GeneratedCertificate record in order to delete the 4 associated files for each PDF certificate, and then removing the links to the relevant files.  this creates a management command to do that work.

After thinking about it, I have removed the update to `status` from this management command, as per the original specification of the ticket. I added it for completeness originally, but was already uncomfortable, because it's not exactly accurate. The `CertificateStatuses` enum does define a `deleted` status:

```
    deleted             - The PDF certificate has been deleted.
```

but I think it's inappropriate to use here.

#### Why not use `CertificateStatuses.deleted` in the first place

There are multiple places in the code where it's clear that almost all of the statuses are legacy and unused (eg. [Example 1](6c6fd84e53/lms/djangoapps/certificates/data.py (L12-L34)), [Example 2](1029de5537/common/djangoapps/student/helpers.py (L491-L492))). There are innumerable APIs in the system that have expectations about what might possibly be returned from a `GeneratedCertificate.status` object, and none of them is expecting `deleted`

#### Why not revoke the certificate

Ultimately, the certificate isn't revoked, which has a specific meaning around saying it was unearned. The certificate was earned; it has simply been deleted.  We should not be kicking off program certificate invalidation, because that's not what's happening. We should be trusting the normal user retirement process to remove/purge PII from any program certificates that might exist. The nature of web certificates simply means that we are going through this process outside of the normal retirement flow. The normal retirement flow can be trusted to implement any certificate object revocation/removal/PII-purging, and doing an extra step outside of that flow is counterproductive.

#### Why not robustly add a flow for `CertificateStatuses.deleted`

When PDF certificates were removed from the system, they weren't removed in their entirety. Instead, we have this vestigial remains of PDF certificates code, just enough to allow learners to display and use the ones that they already have, without any of the support systems for modifying them. Adding a `deleted` status, verifying that all other APIs wouldn't break in the presence of a certificate with that status, adding the signals to process and propagate the change: all of this would be adding more tech debt upon the already existing technical debt which is the PDF certs code. Better to simply add this one necessary data integrity change, and focus on a process which might allow us to eventually remove the web certificates code.

#### Why it is good enough to ignore the status

The original ask was simply to enforce data integrity: to remove links to files that have been deleted, as an indication that they've been deleted. I only added `status` update out of a (misplaced but well-intentioned) completionist urge.

FIXES: APER-3889
This commit is contained in:
Deborah Kaplan
2025-02-21 14:43:04 -05:00
committed by GitHub
parent b6489e718c
commit ecf5aee297
5 changed files with 264 additions and 0 deletions

View File

@@ -22,6 +22,7 @@ from lms.djangoapps.certificates.models import (
CertificateTemplateAsset,
GeneratedCertificate,
ModifiedCertificateTemplateCommandConfiguration,
PurgeReferencestoPDFCertificatesCommandConfiguration,
)
@@ -103,6 +104,11 @@ class CertificateGenerationCommandConfigurationAdmin(ConfigurationModelAdmin):
pass
@admin.register(PurgeReferencestoPDFCertificatesCommandConfiguration)
class PurgeReferencestoPDFCertificatesCommandConfigurationAdmin(ConfigurationModelAdmin):
pass
class CertificateDateOverrideAdmin(admin.ModelAdmin):
"""
# Django admin customizations for CertificateDateOverride model

View File

@@ -0,0 +1,104 @@
"""
A management command designed to be part of the retirement pipeline for any Open EdX operators
with users who still have legacy PDF certificates.
Once an external process has run to remove the four files comprising a legacy PDF certificate,
this management command will remove the reference to the file from the certificate record.
Note: it is important to retain the reference in the certificate table until
the files have been deleted, because that reference is the files' identifying descriptor.
"""
import logging
import shlex
from django.contrib.auth import get_user_model
from django.core.management.base import BaseCommand, CommandError
from lms.djangoapps.certificates.models import (
GeneratedCertificate,
PurgeReferencestoPDFCertificatesCommandConfiguration,
)
User = get_user_model()
log = logging.getLogger(__name__)
class Command(BaseCommand):
"""
Doesn't invoke the custom save() function defined as part of the `GeneratedCertificate`
model; perforce will emit no Django signals. This is desired behavior. We are
using this management command to purge information that was never sent to any
other systems, so we don't need to propagate updates.
Example usage:
# Dry Run (preview changes):
$ ./manage.py lms purge_references_to_pdf_certificates --dry-run
# Purge data:
$ ./manage.py lms purge_references_to_pdf_certificates
"""
help = """Purges references to PDF certificates. Intended to be run after the files have been deleted."""
def add_arguments(self, parser):
parser.add_argument(
"--dry-run",
action="store_true",
help="Shows a preview of what users would be affected by running this management command.",
)
parser.add_argument(
"--certificate_ids",
nargs="+",
dest="certificate_ids",
help="space-separated list of GeneratedCertificate IDs to clean up",
)
parser.add_argument(
"--args-from-database",
action="store_true",
help=(
"Use arguments from the PurgeReferencesToPDFCertificatesCommandConfiguration "
"model instead of the command line"
),
)
def get_args_from_database(self):
"""
Returns an options dictionary from the current CertificateGenerationCommandConfiguration model.
"""
config = PurgeReferencestoPDFCertificatesCommandConfiguration.current()
if not config.enabled:
raise CommandError(
"PurgeReferencestoPDFCertificatesCommandConfiguration is disabled, "
"but --args-from-database was requested"
)
args = shlex.split(config.arguments)
parser = self.create_parser("manage.py", "purge_references_to_pdf_certificates")
return vars(parser.parse_args(args))
def handle(self, *args, **options):
# database args will override cmd line args
if options["args_from_database"]:
options = self.get_args_from_database()
if options["dry_run"]:
dry_run_string = "[DRY RUN] "
else:
dry_run_string = ""
certificate_ids = options.get("certificate_ids")
if not certificate_ids:
raise CommandError("You must specify one or more certificate IDs")
log.info(
f"{dry_run_string}Purging download_url and download_uri "
f"from the following certificate records: {certificate_ids}"
)
if not options["dry_run"]:
GeneratedCertificate.objects.filter(id__in=certificate_ids).update(
download_url="",
download_uuid="",
)

View File

@@ -0,0 +1,101 @@
"""
Tests for the `purge_references_to_pdf_certificates` management command.
"""
import uuid
from django.core.management import CommandError, call_command
from testfixtures import LogCapture
from common.djangoapps.student.tests.factories import UserFactory
from lms.djangoapps.certificates.models import GeneratedCertificate
from lms.djangoapps.certificates.tests.factories import GeneratedCertificateFactory
from xmodule.modulestore.tests.django_utils import ModuleStoreTestCase
from xmodule.modulestore.tests.factories import CourseFactory
class PurgeReferencesToPDFCertificatesTests(ModuleStoreTestCase):
"""
Tests for the `purge_references_to_pdf_certificates` management command.
"""
def setUp(self):
super().setUp()
self.user = UserFactory()
self.course_run_1 = CourseFactory()
self.course_run_2 = CourseFactory()
self.course_run_3 = CourseFactory()
self.cert_1 = GeneratedCertificateFactory(
user=self.user,
course_id=self.course_run_1.id,
download_url="http://example.com/1",
download_uuid=uuid.uuid4(),
grade=1.00,
)
self.cert_2 = GeneratedCertificateFactory(
user=self.user,
course_id=self.course_run_2.id,
download_url="http://example.com/2",
download_uuid=uuid.uuid4(),
grade=2.00,
)
self.cert_3 = GeneratedCertificateFactory(
user=self.user,
course_id=self.course_run_3.id,
download_url="http://example.com/3",
download_uuid=uuid.uuid4(),
grade=3.00,
)
def test_command_with_missing_certificate_ids(self):
"""
Verify command with a missing certificate_ids param.
"""
with self.assertRaises(CommandError):
call_command("purge_references_to_pdf_certificates")
def test_management_command(self):
"""
Verify the management command purges expected data from only the certs requested.
"""
call_command(
"purge_references_to_pdf_certificates",
"--certificate_ids",
self.cert_2.id,
self.cert_3.id,
)
cert1_post = GeneratedCertificate.objects.get(id=self.cert_1.id)
cert2_post = GeneratedCertificate.objects.get(id=self.cert_2.id)
cert3_post = GeneratedCertificate.objects.get(id=self.cert_3.id)
self.assertEqual(cert1_post.download_url, "http://example.com/1")
self.assertNotEqual(cert1_post.download_uuid, "")
self.assertEqual(cert2_post.download_url, "")
self.assertEqual(cert2_post.download_uuid, "")
self.assertEqual(cert3_post.download_url, "")
self.assertEqual(cert3_post.download_uuid, "")
def test_management_command_dry_run(self):
"""
Verify that the management command does not purge any data when invoked with the `--dry-run` flag
"""
expected_log_msg = (
"[DRY RUN] Purging download_url and download_uri "
f"from the following certificate records: {list(str(self.cert_3.id))}"
)
with LogCapture() as logger:
call_command(
"purge_references_to_pdf_certificates",
"--dry-run",
"--certificate_ids",
self.cert_3.id,
)
cert3_post = GeneratedCertificate.objects.get(id=self.cert_3.id)
self.assertEqual(cert3_post.download_url, "http://example.com/3")
self.assertNotEqual(cert3_post.download_uuid, "")
assert logger.records[0].msg == expected_log_msg

View File

@@ -0,0 +1,29 @@
# Generated by Django 4.2.18 on 2025-02-20 22:36
from django.conf import settings
from django.db import migrations, models
import django.db.models.deletion
class Migration(migrations.Migration):
dependencies = [
migrations.swappable_dependency(settings.AUTH_USER_MODEL),
('certificates', '0037_fix_legacy_broken_invalid_certs'),
]
operations = [
migrations.CreateModel(
name='PurgeReferencestoPDFCertificatesCommandConfiguration',
fields=[
('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
('change_date', models.DateTimeField(auto_now_add=True, verbose_name='Change date')),
('enabled', models.BooleanField(default=False, verbose_name='Enabled')),
('arguments', models.TextField(blank=True, default='', help_text="Arguments for the 'purge_references_to_pdf_certificates' management command. Specify like '--certificate_ids <id1> <id2>'")),
('changed_by', models.ForeignKey(editable=False, null=True, on_delete=django.db.models.deletion.PROTECT, to=settings.AUTH_USER_MODEL, verbose_name='Changed by')),
],
options={
'verbose_name': 'purge_references_to_pdf_certificates argument',
},
),
]

View File

@@ -1289,6 +1289,30 @@ class CertificateGenerationCommandConfiguration(ConfigurationModel):
return str(self.arguments)
class PurgeReferencestoPDFCertificatesCommandConfiguration(ConfigurationModel):
"""
Manages configuration for a run of the purge_references_to_pdf_certificates management command.
.. no_pii:
"""
class Meta:
app_label = "certificates"
verbose_name = "purge_references_to_pdf_certificates argument"
arguments = models.TextField(
blank=True,
help_text=(
"Arguments for the 'purge_references_to_pdf_certificates' management command. "
"Specify like '--certificate_ids <id1> <id2>'"
),
default="",
)
def __str__(self):
return str(self.arguments)
class CertificateDateOverride(TimeStampedModel):
"""
Model to manually override a given certificate date with the given date.