digitalmars.D.learn - How to parse epub content

Adnan (2/2) Jan 11 2020 How would someone approach parsing epub files in D? Is there any

JN (4/6) Jan 11 2020 XHTML is XML. There are libraries to parse XML, from std.xml in
Adam D. Ruppe (15/17) Jan 11 2020 I've done it before with my dom.d easily enough.

Adnan <relay.public.adnan outlook.com> writes:

How would someone approach parsing epub files in D? Is there any 
libraries to parse XHTML?

Jan 11 2020

JN <666total wp.pl> writes:

On Saturday, 11 January 2020 at 12:38:38 UTC, Adnan wrote:
 How would someone approach parsing epub files in D? Is there 
 any libraries to parse XHTML?

XHTML is XML. There are libraries to parse XML, from std.xml in 
the standard library to libraries like dxml in the package 
repository.

Jan 11 2020

Adam D. Ruppe <destructionator gmail.com> writes:

On Saturday, 11 January 2020 at 12:38:38 UTC, Adnan wrote:
 How would someone approach parsing epub files in D? Is there 
 any libraries to parse XHTML?

I've done it before with my dom.d easily enough.

The epub itself is a zip file. You might simply unzip it ahead of 
time, or use std.zip to access the contents easily enough. (basic 
zip file support is in phobos).

Then once you get inside there's xhtml files which again are easy 
enough to parse. Like with my dom.d it is as simple as like

import arsd.dom;

// the true,true here tells it to use strict xml mode for xhtml
// isn't really necessary though so it is ok
auto document = new Document(string_holding_xml, true, true);

foreach(ele; document.querySelectorAll("p"))
    writeln(ele.innerText);



the api there is similar to javascript if you're familiar with 
that.

Jan 11 2020

D Programming

C/C++ Programming

Other

digitalmars.D.learn - How to parse epub content