updated readme and cli tool

main
Antonio De Lucreziis 4 weeks ago
parent f0e93e92a2
commit 0d12554e93

@ -1,24 +1,51 @@
# Progetto ASD 2023/2024
## Usage
Creazione e analisi di un Pangenome Graph
## Obbiettivi
- [x] Caricare un Pangenome Graph dal formato GFA
- [x] Classificare i nodi del grafo in base al tipo tree, back, forward, cross
- `cargo run -- --help`
- [x] Rimuovere tutti i nodi di tipo back per rendere il grafo un DAG
Display help message.
- [x] Restringere il grafo alla componente connessa più grande
- `cargo run -- show -i ./dataset/example.gfa`
- [x] Ricostruire le sequenze dei nodi del grafo in base ai cammini ed alla direzione di percorrenza dei nodi (forward o reverse)
Parses the GFA file and prints the graph and various information.
- [x] Ricerca di un pattern k-mer in queste sequenze utilizzando il rolling hash
- Visualize the graph using EGUI, go to `examples/configurable` and run:
- [~] Calcolare le frequenze di occorrenza di tutti i k-mer presenti nel grafo
`cargo run -- ../../dataset/example.gfa`
## CLI Options
`cargo run -- ../../dataset/DRB1-3123_unsorted.gfa`
- `-i, --input <input>`: file to read
- Running the project with all the flags:
- `-c, --path_count <path_count>`: number of paths to visit when searching for the pattern (default: 1)
- `-p, --pattern <pattern>`: k-mer pattern to search (default: "ACGT")
- `-k, --kmer_size <kmer_size>`: k-mer length (default: 4)
## Usage
- To show help message:
```
cargo run -- --help`
```
- For example to try out the `chrX` dataset:
```bash
GFA_URL='https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/pggb/chroms/chrX.hprc-v1.0-pggb.gfa.gz'
wget $GFA_URL -O dataset/chrX.hprc-v1.0-pggb.local.gfa.gz
gunzip dataset/chrX.hprc-v1.0-pggb.local.gfa.gz
cargo run --release -- -i dataset/chrX.hprc-v1.0-pggb.local.gfa -c 2 -p ACGT -k 3
```
`cargo run --release -- show -i dataset/chrX.hprc-v1.0-pggb.local.gfa -c 2 -p ACGT -k 3`
altri dataset sono elencati in [Note](#Note)
## Example GFA

@ -21,20 +21,6 @@ use rolling_hash::RollingHasher;
#[derive(FromArgs, PartialEq, Debug)]
/// Strumento CLI per il progetto di Algoritmi e Strutture Dati 2024
struct CliTool {
#[argh(subcommand)]
nested: CliSubcommands,
}
#[derive(FromArgs, PartialEq, Debug)]
#[argh(subcommand)]
enum CliSubcommands {
Show(CommandShow),
}
#[derive(FromArgs, PartialEq, Debug)]
/// Parse and show the content of a file
#[argh(subcommand, name = "show")]
struct CommandShow {
#[argh(option, short = 'i')]
/// file to read
input: String,
@ -55,8 +41,6 @@ struct CommandShow {
fn main() -> std::io::Result<()> {
let opts = argh::from_env::<CliTool>();
match opts.nested {
CliSubcommands::Show(opts) => {
// validate opts.pattern is a valid DNA sequence
if opts.pattern.chars().any(|c| !"ACGT".contains(c)) {
eprintln!("Invalid pattern: {:?}", opts.pattern);
@ -70,8 +54,7 @@ fn main() -> std::io::Result<()> {
.progress_with(indicatif::ProgressBar::new_spinner())
.count() as u64;
let entries =
gfa::parser::parse_source(std::fs::File::open(opts.input)?, file_lines_count)?;
let entries = gfa::parser::parse_source(std::fs::File::open(opts.input)?, file_lines_count)?;
println!("Number of entries: {}", entries.len());
@ -173,8 +156,6 @@ fn main() -> std::io::Result<()> {
println!("Cleaning up...");
process::exit(0);
}
}
}
fn compute_kmer_histogram_lb(
sequence_map: &HashMap<String, String>,

Loading…
Cancel
Save