Private Messages Options Search Blogs Images Chat Cam Portals Calendar FAQ's Join  
Asylum Forums : Powered by vBulletin version 2.2.8 Asylum Forums > WIT - Whore Institute of Technology > Regular Expression help request
  Last Thread   Next Thread
Author
Thread [new thread]    [post reply]
Daniel
Preternatural

Registered: Apr 2002
Location: On a collision course with reality.
Posts: 334

Regular Expression help request

How can I use regular expressions to match for any word/string that contains an expression totally dependent upon an earlier match in the same string?

For a simplified example, I would like to match all string starting and ending with the same letter (Example string: "car noon treat boys hannah", I would like the regexp that matches against "noon", "treat", and "hanna", and capturing "n", "t", and "h").

For bonus points: I would like to take this problem further though, particularly in matching html tags:

I.e.: I would like to scan my entire html document and match against all 2-part html tags, such as "<b>BOLD TEST</b>", "<a href=whater>sometext</a>", and "<img src="image.jpg>sometext</img>", capturing "b", "a", and "img".

I don't need involved help on the second part (unless someone happens to know it offhand). I'm sure that I can extrapolate the solution to the html tag matcher by knowing the solution to the first problem I posed.

thanks!

__________________
Actually, I prefer a cold body in a warm bed.

Last edited by Daniel on 01-19-2006 at 01:40 AM

Report this post to a moderator | IP: Logged

Old Post 01-19-2006 12:54 AM
Daniel is offline Click Here to See the Profile for Daniel Click here to Send Daniel a Private Message Find more posts by Daniel Add Daniel to your buddy list [P] Edit/Delete Message Reply w/Quote
macker
Holy Me-el

Registered: Nov 2000
Location: UK
Posts: 4737

Your first problem is palindromic in nature and there's some stuff out there for solving it(a common example is finding balanced brackets to extract function snippets from code). But assuming you split the sentence yourself into words, something as simplistic as:

code:
$word =~ /^(\w)(\w+)(\1)$/


Would work.

HTML tags though are a different issue as the tags themselves aren't balanced against each other. You might be able to massage Text::Balanced's extract_bracketed function to do it for you though.

I get the vague impression though that either you're trying to be too clever or solving a problem in a stupid manner.

__________________
Expecting people to be smart team players is like looking for double Ds in an oriental brothel.

Report this post to a moderator | IP: Logged

Old Post 01-19-2006 04:53 PM
macker is offline Click Here to See the Profile for macker Click here to Send macker a Private Message Visit macker's homepage! Find more posts by macker Add macker to your buddy list [P] Edit/Delete Message Reply w/Quote
Daniel
Preternatural

Registered: Apr 2002
Location: On a collision course with reality.
Posts: 334

Probably both, actually.

Thanks, I eventually figured out the solution you gave above. It was simple of course, I had just never used backreferences before.

My next problem is quite frustrating: I'm trying to match the smallest possible tag-pair that is nested inside other tag-pairs of the same kind, and may alsohave other unrelated tags inside it (so no simple [^<]* trickery).

E.g.: "<ul ><li>1<li>2<ul><li>2a<li>2b</ul><li>3</ul>"

I want to match "<ul><li>2a<li>2b</ul>". The "<ul>" is actually a literal. I'm going bald quickly from all the hair pulling...

I feel that this is simple and I am overlooking something obvious, but the best I've been able to get is "<ul><li>1<li>2<ul><li>2a<li>2b</ul>". I need some way to specify "a string that starts with the one-and-only <ul> and ends with </ul>"

__________________
Actually, I prefer a cold body in a warm bed.

Last edited by Daniel on 01-22-2006 at 04:20 AM

Report this post to a moderator | IP: Logged

Old Post 01-22-2006 02:25 AM
Daniel is offline Click Here to See the Profile for Daniel Click here to Send Daniel a Private Message Find more posts by Daniel Add Daniel to your buddy list [P] Edit/Delete Message Reply w/Quote
Daniel
Preternatural

Registered: Apr 2002
Location: On a collision course with reality.
Posts: 334

I figured it out. Unfortunately I had to do a look-ahead on a character by character basis, but it works:

$word =~ /<ul[^>]*?>(.(?!<\/?ul> ))*.(?=<\/ul> )<\/ul>/

Now it will match only <ul> tag-pairs that don't contain further <ul> nesting.

__________________
Actually, I prefer a cold body in a warm bed.

Report this post to a moderator | IP: Logged

Old Post 01-23-2006 03:58 AM
Daniel is offline Click Here to See the Profile for Daniel Click here to Send Daniel a Private Message Find more posts by Daniel Add Daniel to your buddy list [P] Edit/Delete Message Reply w/Quote
macker
Holy Me-el

Registered: Nov 2000
Location: UK
Posts: 4737

Cool. I was going to suggest a lookahead but wanted to reread the perlre perldoc to make sure it would actually do what you wanted(I think I've used lookaheads twice in my entire life, and regretted it a few months later when re-reading the code).

__________________
Expecting people to be smart team players is like looking for double Ds in an oriental brothel.

Report this post to a moderator | IP: Logged

Old Post 01-24-2006 03:59 PM
macker is offline Click Here to See the Profile for macker Click here to Send macker a Private Message Visit macker's homepage! Find more posts by macker Add macker to your buddy list [P] Edit/Delete Message Reply w/Quote
All times are GMT. The time now is 01:24 AM. Post New Thread    Post A Reply
  Last Thread   Next Thread
Show Printable Version | Email this Page | Subscribe to this Thread

Forum Jump:
 

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is ON
 

< Contact Us - The Asylum >

Powered by: vBulletin Version 3.0.6
Copyright ©2000 - 2002, Jelsoft Enterprises Limited.
Copyright © 2000- Imaginet Inc.
[Legal Notice] | [Privacy Policy] | [Site Index]