bash - Add a condition for "empty" string in Python -


in cyber , information security course given project build tool extracts ip ranges table located on website. website tells owner of these ip ranges , if there no owner, tool use whois bash command grep fill empty owner spots. results write file. here code:

#!/usr/bin/python os import popen import bs4 bs import urllib columnscounter = 0 previousip = "" def ipcheck(currentip):     try:         ipsplit = currentip.split(".")         if 1 <= len(ipsplit[0]) <= 3 , 1 <= len(ipsplit[1]) <= 3 , 1 <= len(ipsplit[2]) <= 3 , 1 <= len(ipsplit[3]) <= 3:             result = ".".join(ipsplit)             return result         else:             return     except:         return web = urllib.urlopen('http://www.nirsoft.net/countryip/al.html').read() soup = bs.beautifulsoup(web,'lxml') somedata = soup.find_all("table", {"border":"1","cellpadding":"6","bordercolor":"#000000"}) itemslist = somedata[0].contents[2:] f = open("ip.db", "w") f.write("from ip\t\tto ip\t\tnum ips\tassign date\towner\n") f.close() f = open("ip.db", "a") item in itemslist:     row = item.text[1:].split(" ")     column in row:         column = column.encode("utf-8")         columnscounter += 1         isip = ipcheck(column)         if columnscounter >= 5 , not isip:             f.write(column + " ")         elif columnscounter == 6 , isip:             cmd = "whois {} | grep desc | tail -n 1".format(previousip)             owner = popen(cmd).read().encode("utf-8")             owner = "{}\n".format(owner[16:-1])             f.write(owner)             columnscounter = 1         elif columnscounter > 5 , isip not none:             f.write("\n")             columnscounter = 1         if columnscounter <= 4:             f.write(column + "\t")             if columnscounter == 1:                 previousip = column f.close() 

the output file looks that:

> ip       ip           num ips assign date owner > 31.22.48.0    31.22.63.255    4096    25/03/11    albanian mobile communications sh.a.  > 31.44.64.0    31.44.79.255    4096    24/02/11    abissnet sh.a.  > 46.99.0.0     46.99.255.255   65536   08/06/10      ipko-469900/22 > 46.252.32.0   46.252.47.255   4096    17/12/10    4alb shpk  > 77.242.16.0   77.242.31.255   4096    22/02/07    abissnet sh.a.  > 79.106.0.0    79.106.255.255  65536   23/11/07    albtelecom sh.a.  > 80.78.64.0    80.78.79.255    4096    04/07/01    abcom shpk  > 80.80.160.0   80.80.175.255   4096    17/07/01      ipko-8080160 > 80.90.80.0    80.90.95.255    4096    03/06/05    ada holding - ada air sh.p.k.  > 80.91.112.0   80.91.127.255   4096    09/06/05    abissnet sh.a.  > 82.114.64.0   82.114.95.255   8192    22/12/03      kujtesa network > 84.20.64.0    84.20.95.255    8192    02/09/04    pronet sh.p.k.  > 91.187.96.0   91.187.127.255  8192    24/11/06      ipko-9118796 > 92.60.16.0    92.60.31.255    4096    30/11/07    abissnet sh.a.  > 95.107.128.0  95.107.255.255  32768   02/12/08    "albanian satellite communications" sh.p.k.  > 109.104.128.0 109.104.159.255 8192    04/09/09    itirana sh.p.k.  > 109.236.32.0  109.236.47.255  4096    30/11/09    abissnet sh.a.  > 134.0.32.0    134.0.63.255    8192    01/11/11    agjencia kombetare shoqerise se informacionit  > 213.207.32.0  213.207.63.255  8192    22/12/05    vivo communications sh p k  > 217.21.144.0  217.21.159.255  4096    21/10/10    nisatel ltd  > 217.24.240.0  217.24.255.255  4096    14/05/03    albtelecom sh.a.  > 217.73.128.0  217.73.143.255  4096    17/01/11    abcom shpk 

the problem is: under owner column there few "owners" have couple of spaces @ beginning of owner name. these owner names filled whois bash command. found out these spaces added following python lines:

    if columnscounter >= 5 , not isip:         f.write(column + " ") 

from investigations found out happens when var column equal 1 of blanked owners on website. if run print column (2x spacebars) length 2.

my question is: don't want filter length (who knows, maybe there owner name contains 2 chars). additionally, condition of if column == " ": (or variation of string) not working. how find out string , filter it?

as mentioned in comments, if filtering isn't matching, it's probable aren't standard spacebar presses. python function ord(ch) character ch give numeric representation of character can disambiguate other similar looking characters.

of course, filter that character out when discover is. method, maybe better, "sanitize" strings---remove that's not alphabetical or alphanumeric or something, , whitespaces removed, allowing filter empty string.

for example, can sanitize column has unexpected characters in it; in case, string contains digit, non-english characters, , symbol in extended ascii table thats not alphanumeric. if wanted keep except symbol, do:

>>> column = "5ome çharß ¼" >>> "".join([c c in column if c.isalpha() or c.isdigit() or c == ' ']) '5ome çharß ' 

it kept digits, alpha characters, non-english ones, , spaces, not ¼ symbol. check if sanitized string equal empty string. think solution nice because generalizes relatively well.

if you're worried there still might spaces, can .strip() string default strip spaces, , if that's thing in there, it'll give empty string. there's 'some string'.isspace() method checks if string contains whitespaces. possibly use on original weird column name; i'm not sure if character included whitespace in function or not don't know character is.


Comments

Popular posts from this blog

neo4j - finding mutual friends in a cypher statement starting with three or more persons -

php - How to remove letter in front of the word laravel -

minify - Minimizing css files -