java - JSoup: Problems getting text from elements -


import java.io.ioexception;  import org.jsoup.jsoup; import org.jsoup.nodes.document; import org.jsoup.nodes.element;   public class main {     public static void main(string[] args) throws exception {         document d=jsoup.connect("https://osu.ppy.sh/u/charless").get();          for(element line : d.select("div.profilestatline")) {             system.out.println(d.select("b").text());         }     } } 

i'm having problems getting text "2027pp (#97,094)" in div.profilestatline b. should output, doesn't. url: https://osu.ppy.sh/u/charless

parts of page loaded javascript, why can't see divs you're looking for.

you can use browser load page , interpret javascript before parsing. library webdrivermanager help.

public static void main(string[] args) throws exception {     chromedrivermanager.getinstance().setup();     chromedriver chromedriver = new chromedriver();     chromedriver.get("https://osu.ppy.sh/u/charless");      document d = jsoup.parse(chromedriver.getpagesource());      chromedriver.close();      (element line : d.select("div.profilestatline")) {         system.out.println(line.select("b").text());     } } 

the alternative examine javascript in page , make same calls retrieve data.

the page loading profile https://osu.ppy.sh/pages/include/profile-general.php?u=4084042&m=0. looks u user id, relatively simple extract page:

public class profilescraper {     private static final pattern uid_pattern = pattern.compile("var userid = (\\d+);");      public static void main(string[] args) throws ioexception {         string uid = getuid("charless");         document d = jsoup.connect("https://osu.ppy.sh/pages/include/profile-general.php?u=" + uid).get();          (element line : d.select("div.profilestatline")) {             system.out.println(line.select("b").text());         }     }      public static string getuid(string name) throws ioexception {         document d1 = jsoup.connect("https://osu.ppy.sh/u/" + name).get();          (element script : d1.select("script")) {             string text = script.data();             matcher uidmatcher = uid_pattern.matcher(text);             if (uidmatcher.find()) {                 return uidmatcher.group(1);             }         }         throw new ioexception("no such character");     } } 

Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

Add a dynamic header in angular 2 http provider -

minify - Minimizing css files -