bash - Add a condition for "empty" string in Python -
in cyber , information security course given project build tool extracts ip ranges table located on website. website tells owner of these ip ranges , if there no owner, tool use whois bash command grep fill empty owner spots. results write file. here code:
#!/usr/bin/python os import popen import bs4 bs import urllib columnscounter = 0 previousip = "" def ipcheck(currentip): try: ipsplit = currentip.split(".") if 1 <= len(ipsplit[0]) <= 3 , 1 <= len(ipsplit[1]) <= 3 , 1 <= len(ipsplit[2]) <= 3 , 1 <= len(ipsplit[3]) <= 3: result = ".".join(ipsplit) return result else: return except: return web = urllib.urlopen('http://www.nirsoft.net/countryip/al.html').read() soup = bs.beautifulsoup(web,'lxml') somedata = soup.find_all("table", {"border":"1","cellpadding":"6","bordercolor":"#000000"}) itemslist = somedata[0].contents[2:] f = open("ip.db", "w") f.write("from ip\t\tto ip\t\tnum ips\tassign date\towner\n") f.close() f = open("ip.db", "a") item in itemslist: row = item.text[1:].split(" ") column in row: column = column.encode("utf-8") columnscounter += 1 isip = ipcheck(column) if columnscounter >= 5 , not isip: f.write(column + " ") elif columnscounter == 6 , isip: cmd = "whois {} | grep desc | tail -n 1".format(previousip) owner = popen(cmd).read().encode("utf-8") owner = "{}\n".format(owner[16:-1]) f.write(owner) columnscounter = 1 elif columnscounter > 5 , isip not none: f.write("\n") columnscounter = 1 if columnscounter <= 4: f.write(column + "\t") if columnscounter == 1: previousip = column f.close() the output file looks that:
> ip ip num ips assign date owner > 31.22.48.0 31.22.63.255 4096 25/03/11 albanian mobile communications sh.a. > 31.44.64.0 31.44.79.255 4096 24/02/11 abissnet sh.a. > 46.99.0.0 46.99.255.255 65536 08/06/10 ipko-469900/22 > 46.252.32.0 46.252.47.255 4096 17/12/10 4alb shpk > 77.242.16.0 77.242.31.255 4096 22/02/07 abissnet sh.a. > 79.106.0.0 79.106.255.255 65536 23/11/07 albtelecom sh.a. > 80.78.64.0 80.78.79.255 4096 04/07/01 abcom shpk > 80.80.160.0 80.80.175.255 4096 17/07/01 ipko-8080160 > 80.90.80.0 80.90.95.255 4096 03/06/05 ada holding - ada air sh.p.k. > 80.91.112.0 80.91.127.255 4096 09/06/05 abissnet sh.a. > 82.114.64.0 82.114.95.255 8192 22/12/03 kujtesa network > 84.20.64.0 84.20.95.255 8192 02/09/04 pronet sh.p.k. > 91.187.96.0 91.187.127.255 8192 24/11/06 ipko-9118796 > 92.60.16.0 92.60.31.255 4096 30/11/07 abissnet sh.a. > 95.107.128.0 95.107.255.255 32768 02/12/08 "albanian satellite communications" sh.p.k. > 109.104.128.0 109.104.159.255 8192 04/09/09 itirana sh.p.k. > 109.236.32.0 109.236.47.255 4096 30/11/09 abissnet sh.a. > 134.0.32.0 134.0.63.255 8192 01/11/11 agjencia kombetare shoqerise se informacionit > 213.207.32.0 213.207.63.255 8192 22/12/05 vivo communications sh p k > 217.21.144.0 217.21.159.255 4096 21/10/10 nisatel ltd > 217.24.240.0 217.24.255.255 4096 14/05/03 albtelecom sh.a. > 217.73.128.0 217.73.143.255 4096 17/01/11 abcom shpk the problem is: under owner column there few "owners" have couple of spaces @ beginning of owner name. these owner names filled whois bash command. found out these spaces added following python lines:
if columnscounter >= 5 , not isip: f.write(column + " ") from investigations found out happens when var column equal 1 of blanked owners on website. if run print column (2x spacebars) length 2.
my question is: don't want filter length (who knows, maybe there owner name contains 2 chars). additionally, condition of if column == " ": (or variation of string) not working. how find out string , filter it?
as mentioned in comments, if filtering isn't matching, it's probable aren't standard spacebar presses. python function ord(ch) character ch give numeric representation of character can disambiguate other similar looking characters.
of course, filter that character out when discover is. method, maybe better, "sanitize" strings---remove that's not alphabetical or alphanumeric or something, , whitespaces removed, allowing filter empty string.
for example, can sanitize column has unexpected characters in it; in case, string contains digit, non-english characters, , symbol in extended ascii table thats not alphanumeric. if wanted keep except symbol, do:
>>> column = "5ome çharß ¼" >>> "".join([c c in column if c.isalpha() or c.isdigit() or c == ' ']) '5ome çharß ' it kept digits, alpha characters, non-english ones, , spaces, not ¼ symbol. check if sanitized string equal empty string. think solution nice because generalizes relatively well.
if you're worried there still might spaces, can .strip() string default strip spaces, , if that's thing in there, it'll give empty string. there's 'some string'.isspace() method checks if string contains whitespaces. possibly use on original weird column name; i'm not sure if character included whitespace in function or not don't know character is.
Comments
Post a Comment