Our team here at Storj Labs is currently in the middle of adding support for CockroachDB, which is a horizontally scalable Postgres compatible database. Each database technology behaves differently and it's important to understand the tradeoffs they make to utilize them efficiently. Along the way, we are learning a lot about CockroachDB and one of the things we have learned is how deletes impact database performance.
CockroachDB uses a technique called MVCC¹ to manage concurrent transactions. Deletes leave behind tombstones to mark records as deleted. Deleted records don't get removed from the primary key/value store or indices until the gc_ttl grace period window expires, which has a default of 25 hours. This means any query using the table has to process more data than you may expect if you assumed all those records were immediately removed from the system. I want to stress, this doesn't violate any transactional guarantees and doesn't return any undesired results. Deleted data appears deleted correctly. This only affects performance. If you're doing sparse deletes this probably won't be noticeable. If you're doing bulk deletes you may notice performance doesn't improve after you have issued deletes until the 25-hour window has expired and has purged the bulk deleted records. Old values changed with updates also get purged when the gc_ttl grace period has expired.
Another thing to consider with CockroachDB deletes is that if you issue too large of a delete statement you may experience a query too large exception. To work around this you can delete records with a limit or some continuous range of the primary key.
Some techniques to consider to mitigate these side effects