Tutorial: Compare fasterp and fastp
This tutorial walks through downloading real sequencing data, running both fasterp and fastp, and comparing their outputs and reports.
Prerequisites
- fasterp installed (
cargo install --git https://github.com/drbh/fasterp.git) - fastp installed (installation guide)
- curl for downloading files
- jq (optional) for inspecting JSON reports
Step 1: Download Test Data
Download paired-end FASTQ files from SRX2987343 - a chromatin accessibility study on mouse hair follicle stem cells, examining how DNA packaging changes during differentiation into hair-related cell types.
# Create a working directory
mkdir -p fasterp_tutorial && cd fasterp_tutorial
# Download R1 (forward reads) - 365 MB
curl -LO https://genedata.dholtz.com/SRX2987343/SRR5808766_1.fastq.gz
# Download R2 (reverse reads) - 364 MB
curl -LO https://genedata.dholtz.com/SRX2987343/SRR5808766_2.fastq.gz
# Verify downloads
ls -lh *.gz
Expected output:
-rw-r--r-- 1 user staff 365M SRR5808766_1.fastq.gz
-rw-r--r-- 1 user staff 364M SRR5808766_2.fastq.gz
Step 2: Run fastp
Process the paired-end data with fastp:
fastp \
-i SRR5808766_1.fastq.gz -I SRR5808766_2.fastq.gz \
-o fastp_out_R1.fq.gz -O fastp_out_R2.fq.gz \
-j fastp_report.json \
-h fastp_report.html
Expected output:
Read1 before filtering:
total reads: 9799076
total bases: 499752876
Q20 bases: 494840792(99.0171%)
Q30 bases: 490032403(98.0549%)
Q40 bases: 333003632(66.6337%)
Read2 before filtering:
total reads: 9799076
total bases: 499752876
Q20 bases: 491032411(98.255%)
Q30 bases: 485393847(97.1268%)
Q40 bases: 335965361(67.2263%)
Read1 after filtering:
total reads: 9761787
total bases: 485985544
Q20 bases: 481444894(99.0657%)
Q30 bases: 476939326(98.1386%)
Q40 bases: 324904111(66.8547%)
Read2 after filtering:
total reads: 9761787
total bases: 485985544
Q20 bases: 479080210(98.5791%)
Q30 bases: 473663787(97.4646%)
Q40 bases: 327920585(67.4754%)
Filtering result:
reads passed filter: 19523574
reads failed due to low quality: 74384
reads failed due to too many N: 194
reads failed due to too short: 0
reads with adapter trimmed: 3113628
bases trimmed due to adapters: 23734734
Duplication rate: 28.3885%
Insert size peak (evaluated by paired-end reads): 51
JSON report: fastp_report.json
HTML report: fastp_report.html
fastp v1.0.1, time used: 11 seconds
Step 3: Run fasterp
Process the same data with fasterp:
fasterp \
-i SRR5808766_1.fastq.gz -I SRR5808766_2.fastq.gz \
-o fasterp_out_R1.fq.gz -O fasterp_out_R2.fq.gz \
-j fasterp_report.json \
--html fasterp_report.html
Expected output:
Detecting adapter sequence for read1...
No adapter detected for read1
Detecting adapter sequence for read2...
No adapter detected for read2
Read1 before filtering:
total reads: 9799076
total bases: 499752876
Q20 bases: 494840792(99.0171%)
Q30 bases: 490032403(98.0549%)
Q40 bases: 333003632(66.6337%)
Read2 before filtering:
total reads: 9799076
total bases: 499752876
Q20 bases: 491032411(98.2550%)
Q30 bases: 485393847(97.1268%)
Q40 bases: 335965361(67.2263%)
Read1 after filtering:
total reads: 9761787
total bases: 485985544
Q20 bases: 481444894(99.0657%)
Q30 bases: 476939326(98.1386%)
Q40 bases: 324904111(66.8547%)
Read2 after filtering:
total reads: 9761787
total bases: 485985544
Q20 bases: 479080210(98.5791%)
Q30 bases: 473663787(97.4646%)
Q40 bases: 327920585(67.4754%)
Filtering result:
reads passed filter: 19523574
reads failed due to low quality: 74384
reads failed due to too many N: 194
reads failed due to too short: 0
reads with adapter trimmed: 3113628
bases trimmed due to adapters: 23734734
Duplication rate: 28.3885%
Insert size peak (evaluated by paired-end reads): 51
JSON report: fasterp_report.json
HTML report: fasterp_report.html
fasterp v0.1.0, time used: 6 seconds
Note: fasterp processed ~10 million read pairs in 6 seconds vs fastp’s 11 seconds.
Step 4: Compare Outputs
Verify identical FASTQ output
When comparing gzipped files, the compressed hashes may differ due to compression metadata. Compare the decompressed content:
# Compare decompressed R1 content
gunzip -c fastp_out_R1.fq.gz | shasum -a 256
gunzip -c fasterp_out_R1.fq.gz | shasum -a 256
# Compare decompressed R2 content
gunzip -c fastp_out_R2.fq.gz | shasum -a 256
gunzip -c fasterp_out_R2.fq.gz | shasum -a 256
Expected output:
716d6bc9b5aa075e5f7ff527b1638de1ea56b67439b7a7646bfe25fb14d132e7 -
716d6bc9b5aa075e5f7ff527b1638de1ea56b67439b7a7646bfe25fb14d132e7 -
c28e4e14c58ded4f4a7b776f6fb9f92cabc83056253f0ff4e268e5db1653b656 -
c28e4e14c58ded4f4a7b776f6fb9f92cabc83056253f0ff4e268e5db1653b656 -
The hashes match - both tools produced identical output.
Compare key statistics
# Total reads before/after filtering
echo "=== fastp ==="
jq '{
before: .summary.before_filtering.total_reads,
after: .summary.after_filtering.total_reads,
passed_rate: .summary.after_filtering.total_reads / .summary.before_filtering.total_reads * 100
}' fastp_report.json
echo "=== fasterp ==="
jq '{
before: .summary.before_filtering.total_reads,
after: .summary.after_filtering.total_reads,
passed_rate: .summary.after_filtering.total_reads / .summary.before_filtering.total_reads * 100
}' fasterp_report.json
Expected output:
=== fastp ===
{
"before": 19598152,
"after": 19523574,
"passed_rate": 99.61946412090282
}
=== fasterp ===
{
"before": 19598152,
"after": 19523574,
"passed_rate": 99.61946412090282
}
Compare k-mer counts
# Top k-mers should be identical
echo "=== fastp k-mers ==="
jq '.read1_before_filtering.kmer_count | to_entries | sort_by(-.value) | .[0:5]' fastp_report.json
echo "=== fasterp k-mers ==="
jq '.read1_before_filtering.kmer_count | to_entries | sort_by(-.value) | .[0:5]' fasterp_report.json
Expected output:
=== fastp k-mers ===
[
{ "key": "CTGTC", "value": 1651445 },
{ "key": "TGTCT", "value": 1651009 },
{ "key": "TCTCT", "value": 1576849 },
{ "key": "GTCTC", "value": 1358458 },
{ "key": "CTCTT", "value": 1314738 }
]
=== fasterp k-mers ===
[
{ "key": "CTGTC", "value": 1651445 },
{ "key": "TGTCT", "value": 1651009 },
{ "key": "TCTCT", "value": 1576849 },
{ "key": "GTCTC", "value": 1358458 },
{ "key": "CTCTT", "value": 1314738 }
]
Step 5: View Reports
Open the HTML reports in your browser to compare visualizations:
# macOS
open fastp_report.html fasterp_report.html
# Linux
xdg-open fastp_report.html && xdg-open fasterp_report.html
# Or start a local server
python3 -m http.server 8000
# Then visit http://localhost:8000
Example Reports
Here are the actual reports generated from this tutorial:
| Tool | Report |
|---|---|
| fastp | fastp_report.html |
| fasterp | fasterp_report.html |
Compare these side-by-side to see identical quality metrics, base content graphs, and k-mer distributions.
Cleanup
# Remove generated files
rm -f *.gz fastp_report.* fasterp_report.*
cd .. && rmdir fasterp_tutorial
Next Steps
- Try with your own FASTQ data
- Experiment with different filtering parameters
- Check the CLI Reference for all available options